Hey Ed, I want to comment on the NFS aspects involed here, On Thu, May 21, 2009 at 3:55 AM, Edward Pilatowicz wrote: > >well, it all depends on what nfs shares are actually being exported.
I definitively think we do want to abstain from that much programmatic attempts inside the Zones framework on making assumptions about what an NFS server does export, how the NFS servers exported namespace may look like and how the NFS client (who's running the Zone) handles those exports upon access as opposed to explicit mounting. It is merely okay for the NFS v2/v3 (and their helper) protocols world but it is not always adequate for the V4 protocol and all the work/features in V4 and V4.1 towards a unified, global namespace. I'll show why in the context of V4 on the examples you mentioned below. >if the nfs server has the following share(s) exported: > >nfsserver:/vol > >then you would have the following mount(s): >/var/zones/nfsmount/zone1/nfsserver/vol >/var/zones/nfsmount/zone2/nfsserver/vol >/var/zones/nfsmount/zone3/nfsserver/vol > >if the nfs server has the following share(s) exported: > >nfsserver:/vol/zones > >then you would have the following mount(s): >/var/zones/nfsmount/zone1/nfsserver/vol/zones >/var/zones/nfsmount/zone2/nfsserver/vol/zones >/var/zones/nfsmount/zone3/nfsserver/vol/zones in those 2 examples, we'd have to consider how the V4 server constructs it's pseudo namespace starting at the servers root, including what we call pseudo exports that build the bridge to the real exported share points at the server and how the V4 client may handle this. for instance, on the V4 server the export: /vol may (and probably will) have different ZFS datasets that host our zones underneath /vol eg.: /vol/zone1 /vol/zone2 /vol/zone3 since they are separate ZFS datasets, we would cross file system boundaries while traversing from the exported servers root / over the share point /vol down to the (also presumably exported, otherwise it wouln't be usefull in our context anyways) share points zone1/zone2/zone3. We'll distingish between the different file systems based on the FSID attribute, if it changes, we'd cross server file system boundaries. With V2/V3 that'd stop us and the client can not travel into the new file system below the inital mount and a separate mount would have to be performed (unless we've explicitely mounted the entire path of course) However, with V4, the client has the (in our implementation) so called Mirror Mount feature. That allows the client to transparantly mount those new file systems on access below the starting share point /vol (provided they are shared as well) and make them immediately visible without requiring the user for perform any additional mounts. Those mirror mounts will be done automatically by our V4 client in the kernel as it detects it'd cross server side file system boundaries (based on the FSID) on any access other then VOP_LOOKUP() or VOP_GETATTR(). Ie. if the global zone did already have mounted server:/vol an attempt by the zone utilities to access (as opposed to explicit mounting) of server:/vol/zone1 will automatically mount server:/vol/zone1 into the clients namespace and you'd get on the client (nfsstat -m) 2 mounts: server:/vol (already existing regular mount) server:/vol/zone1 (the mirror mount done by the client) if we'd really perform a mount though, that'd just induce the mount of server:/vol/zone1 into the clients namespace running the zone. With the advent of the upcomming NFS v4 Referrals support in the V4 server and V4 client, another 'automatism' in the client can possibly change our observation of the mounted server exports on the client running the zone. On the V4 server (that is hosting our zone image) the administrator might decide to relocate the export to a different server and then might establish a so called 'reparse point' (in essence a symlink containing special infos) that will redirect a client to a different server hosting this export. NB: other Vendors NFS servers might hand out referrals to NB2: the same feature will be supported by our CIFS client The V4 client can get a specific referral event (NFS4ERR_MOVED) on VOP_LOOKUP(), VOP_GETATTR() and during inital mount processing by observing the NFS4ERR_MOVED error and it'll fetch the new location information from the server via the 'fs_locations' attribute. Our client will then go off and automatically mount the file system from the different server it had been referred to from the inital server. Again like mirror mounts, this is done transparently for the user and inside the kernel. The minor but important quirk involved here as far as our observation from the Zone NFS client is concered is that we might get for our mount attempt (or access to) server_A:/vol/zones1 a mount established instead for server_B:/vol/zones1 It is planned to even provide our V2/V3 clients with Referral support when taking to our NFS servers, although the implementation will slightely differ and I'm not yet sure how that V2/V3 clients referral mount will be observed on the NFS client. While this (Referrals) currently only affects inital access and mounting, in the future with Migration and Replication support being implemented, litterally every NFS v4 OTW OP may get a 'migration event', aka. receive NFS4ERR_MOVED. This is still in the early design stages, but we have to expect that from the Zones NFS clients observability stand point the 'nfsserver' portion of the mounted export may silently be 're-written' behind the scenes instead of doing a separate 2nd mount, ie.: our inital zone initiated access/mount: server_OLD:/vol/zone1 Oops, migration even happens to the client, now will silently become: server_NEW:/vol/zone1 this will be reflected in things like nfsstat(1M) output as well. >if the nfs server has the following share(s) exported: > >nfsserver:/vol/zones/zone1 >nfsserver:/vol/zones/zone2 >nfsserver:/vol/zones/zone3 > >then you would have the following mount(s): > >/var/zones/nfsmount/zone1/nfsserver/vol/zones/zone1 >/var/zones/nfsmount/zone2/nfsserver/vol/zones/zone2 >/var/zones/nfsmount/zone3/nfsserver/vol/zones/zone3 as I tried to explain above, the 'nfsserver' part can be a moving target as far as our observability from the Zone NFS client is concerned. >afaik, determining the mount point should be pretty strait forward. >i was planning to get a list of all the shares exported by the specified >nfs server, and then do a strncmp() of all the exported shares against >the specified path. the longest matching share name is the mount path. Well, that in turn is anything but straight forward and almost impossible for NFS v4 servers. For the V2/V3 clients that do use the mount protocol to instantiate a mount the servers mountd(1M) from a V2/V3 server can be asked by the client using the MOUNTPROC_EXPORT/MOUNTPROC3_EXPORT RPC procedure to return a list of exported file systems. This is used by commands like showmount(1M) or dfshares(1M) to list servers exported file systems, however there's no API available todo that other then writing your own RPC aware application doing essentially rpc_clnt_calls(3NSL) talking to a remote V2/V3 servers mountd(1M). But, the V4 protocol does not use the mount protocol at all anymore so there's no real programmatic way to retrieve a list of exported file systems from a V4 server. This would not make much sense in the context of the V4 protocol anyways because of the way the V4 server constructs its pseudo namespace starting from servers root / potentially involving pseudo export nodes that eventually bridge to the real share points. You may be lucky and the exported file systems are shared for V3 and V4 in which case you can make an educated guess at least. >for example. if we have: >nfs://jurassic/a/b/c/d/file > >and jurassic is exporting: >jurassic:/a >jurassic:/a/b >jurassic:/a/b/c > >then our mount path with be: >/var/zones/nfsmount/jurassic/a/b/c > >and our encapsulated zvol will be accessible at: >/var/zones/nfsmount/jurassic/a/b/c/d/file > >afaik, this is acutally the only way that this could be implemented. for above reasons I'd rather stay away from implementing some logic to figure out what to mount based on a potential list of exported file systems from the server and rather stick with some basics configured via zonecfg in the way: NFS path = 'nfs://<host>[:port]/<export>' Zone image = '<[dir to]filename>' that way we avoid the problem of having to parse the entire current proposed SO-URI like: 'nfs://<host>[:port]/<file-absolute>'. and probe what part of that pathname may be suitable as a mount. Of course we could always say that anything before the image file name itself shall be in essence an exported path suitable for performing a mount. Also when talking to V4 servers we could also always just mount the servers root / and then an any access to the <file-absolute> path will trigger a mirror mount, but then, this does not work for V2/V3 servers though. I think we may want to elaborate a bit more on the use of the current proposed NFS SO-URI of: 'nfs://<host>[:port]/<file-absolute>' and its use from Zone land to perform mounts and access the zone image. cheers frankB -- This message posted from opensolaris.org _______________________________________________ zones-discuss mailing list zones-discuss@opensolaris.org