On Sep 14, 2007, at 11:09 AM, Andreas Dilger wrote: > On Sep 14, 2007 08:52 -0600, Mark Maybee wrote: >>> Without knowing the details, it would seem at first glance that >>> having variable dnode size would be fairly complex. Aren't the >>> dnodes just stored in a single sparse object and accessed by >>> dnode_size * objid? This does seem desirable from the POV that >>> if you have an existing fs with the current dnode size you don't >>> want to need a reformat in order to use the larger size. >> >> I was referring here to supporting multiple dnode sizes within a >> *pool*, but the size would still remained fixed for a given dataset >> (see Bill's mail). This is a much simpler concept to implement. > > Ah, sure. That would be a lot easier to implement. > >>> That is true, and we discussed this internally, but one of the >>> internal >>> requirements we have for DMU usage is that it create an on-disk >>> layout >>> that matches ZFS so that it is possible to mount a Lustre filesystem >>> via ZFS or ZFS-FUSE (and potentially the reverse in the future). >>> This will allow us to do problem diagnosis and also leverage any ZFS >>> scanning/verification tools that may be developed. >> >> Ah, interesting, I was not aware of this requirement. It would >> not be >> difficult to allow the ZPL to work with a larger dnode size (in fact >> its pretty much a noop as long as the ZPL is not trying to use any of >> the extra space in the dnode). > > I agree, but I suspect large dnodes could also be of use to ZFS at > some point, either for fast EAs and/or small files, so we wanted to > get some buy-in from the ZFS developers on an approach that would > be suitable for ZFS also. In particular, being able to use the larger > dnode space for a variety of reasons (more elements in dn_blkptr[], > small file data, fast EA space) is much more desirable than a > Lustre-only > implementation. > > Also, given that we'd want to be able to access the EAs via ZPL if > mounted as ZFS would be important for debugging/backup/restore/etc. > > I suspect the Lustre development approach would be the same with ZFS > as it is with ext3, which has been quite successful to this point. > Namely, we're happy to develop new functionality in ZFS/DMU as needed > so long as we get buy-in from the ZFS team on the design and most > importantly the on-disk format. We don't want to create a permanent > fork in the code or on-disk format that separates Lustre-ZFS from > Solaris-ZFS, which is the whole point to starting this discussion long > before we're going to start implementing anything.
Absolutley, let's make sure we all agree on the on-disk changes. This has been a major focus for us when working with the OSX and FreeBSD people. So far we've been quite successful (and i don't see any reason why we won't be in the future). Its great to hear you want the same thing. Another nice thing is that ZFS was designed to support on-disk changes (see zpool upgrade). eric
