On Sep 14, 2007, at 11:09 AM, Andreas Dilger wrote:

> On Sep 14, 2007  08:52 -0600, Mark Maybee wrote:
>>> Without knowing the details, it would seem at first glance that
>>> having variable dnode size would be fairly complex.  Aren't the
>>> dnodes just stored in a single sparse object and accessed by
>>> dnode_size * objid?  This does seem desirable from the POV that
>>> if you have an existing fs with the current dnode size you don't
>>> want to need a reformat in order to use the larger size.
>>
>> I was referring here to supporting multiple dnode sizes within a
>> *pool*, but the size would still remained fixed for a given dataset
>> (see Bill's mail).  This is a much simpler concept to implement.
>
> Ah, sure.  That would be a lot easier to implement.
>
>>> That is true, and we discussed this internally, but one of the  
>>> internal
>>> requirements we have for DMU usage is that it create an on-disk  
>>> layout
>>> that matches ZFS so that it is possible to mount a Lustre filesystem
>>> via ZFS or ZFS-FUSE (and potentially the reverse in the future).
>>> This will allow us to do problem diagnosis and also leverage any ZFS
>>> scanning/verification tools that may be developed.
>>
>> Ah, interesting, I was not aware of this requirement.  It would  
>> not be
>> difficult to allow the ZPL to work with a larger dnode size (in fact
>> its pretty much a noop as long as the ZPL is not trying to use any of
>> the extra space in the dnode).
>
> I agree, but I suspect large dnodes could also be of use to ZFS at
> some point, either for fast EAs and/or small files, so we wanted to
> get some buy-in from the ZFS developers on an approach that would
> be suitable for ZFS also.  In particular, being able to use the larger
> dnode space for a variety of reasons (more elements in dn_blkptr[],
> small file data, fast EA space) is much more desirable than a  
> Lustre-only
> implementation.
>
> Also, given that we'd want to be able to access the EAs via ZPL if
> mounted as ZFS would be important for debugging/backup/restore/etc.
>
> I suspect the Lustre development approach would be the same with ZFS
> as it is with ext3, which has been quite successful to this point.
> Namely, we're happy to develop new functionality in ZFS/DMU as needed
> so long as we get buy-in from the ZFS team on the design and most
> importantly the on-disk format.  We don't want to create a permanent
> fork in the code or on-disk format that separates Lustre-ZFS from
> Solaris-ZFS, which is the whole point to starting this discussion long
> before we're going to start implementing anything.

Absolutley, let's make sure we all agree on the on-disk changes.   
This has been a major focus for us when working with the OSX and  
FreeBSD people.  So far we've been quite successful (and i don't see  
any reason why we won't be in the future).  Its great to hear you  
want the same thing.

Another nice thing is that ZFS was designed to support on-disk  
changes (see zpool upgrade).

eric

Reply via email to