[zfs-code] Design for EAs in dnode.

Ricardo M. Correia Thu, 07 Feb 2008 17:54:22 +0000

On Qui, 2008-02-07 at 10:29 -0700, Mark Shellenbaum wrote:

> Ok, but we don't know how many attributes future consumers may want to 
> store.  I hope it will be small, but I wouldn't want to paint ourselves 
> in a corner and then wish we had done it a different way.



That would be a matter of falling back to the current mechanism if we
reach a sufficiently large number of attributes (>100?).
We are simply interested in optimizing for the large majority of cases.


> > We could also provide a similar interface to add/update an attribute 
> > that would manipulate the encoded format directly (the XDR format looks 
> > fairly simple, I think).
> > 
> 
> I think that would be complicated.  You could potentionally have a 
> packed buffer that needs to be expanded in size or compressed depending 
> on how the attribute is changed/added.



Yes.. So we have 4 cases:

1) Adding an attribute: we would just add it at the end (very simple).
2) Changing an attribute without changing it's size: it could be changed
in-place.
3) Changing an attribute and changing it's size: we could use memmove()
or similar to copy the attributes after the one we're changing to the
correct place.
4) Deleting an attribute: we would also use memmove() to copy the
following attributes to the correct place.

We'd have to do a size check before doing an operation to see if it
would still fit in the provided buffer (which would be the bonus
buffer).

Whatever format we'd use, I think the important thing is to have as less
space overhead as possible in order to fit in the dnode.
So I think a packed format is probably desirable even if it means it
would not be 100% efficient CPU-wise in some cases (cases 3 and 4).


> > Although we don't compile the majority of the ZPL code due to VFS 
> > dependencies, we do compile zfs_znode.c in userspace (without the 
> > _KERNEL definition).
> > 
> 
> Are you referring to the linux/fuse implementation?  If so, then I would 
> probably suggest we add some ioctl interfaces to retrieve the 
> attributes.  


I'm referring to the Lustre implementation (which has many things in
common with Linux/FUSE due to being implemented in userspace).

We cannot use ioctls() to retrieve attributes because the DMU will be
running completely in userspace.
In Solaris (well, in all Lustre-supported OSes), we export the pool from
the native (kernel) implementation and import it into our userspace
process which runs libzpool in a similar way as ztest.

This means that the native kernel implementation *cannot* have access to
the ZFS pool while Lustre is running..


> there is no need for Solaris to add new VOP interfaces for 
> these.


Great, we don't need them either :)
However, I think we do not have enough experience to fix the existing
Solaris VFS interface to use the new mechanism. Well, we could, but it
would takes us much, much longer than if it were you guys :)

Thanks,
Ricardo

--

Ricardo Manuel Correia
Lustre Engineering

Sun Microsystems, Inc.
Portugal
Phone +351.214134023 / x58723
Mobile +351.912590825
Email Ricardo.M.Correia at Sun.COM
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.opensolaris.org/pipermail/zfs-code/attachments/20080207/48f734a6/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 6g_top.gif
Type: image/gif
Size: 1257 bytes
Desc: not available
URL: 
<http://mail.opensolaris.org/pipermail/zfs-code/attachments/20080207/48f734a6/attachment.gif>

[zfs-code] Design for EAs in dnode.

Reply via email to