Re: [zones-discuss] zones on shared storage proposal

Frank Batschulat Fri, 27 Nov 2009 09:13:15 -0800

Hey Ed, I want to comment on the NFS aspects involed here,

On Thu, May 21, 2009 at 3:55 AM, Edward Pilatowicz wrote:  
>
>well, it all depends on what nfs shares are actually being exported.


I definitively think we do want to abstain from that much programmatic
attempts inside the Zones framework on making assumptions about what
an NFS server does export, how the NFS servers exported namespace
may look like and how the NFS client (who's running the Zone) 
handles those exports upon access as opposed to explicit mounting.

It is merely okay for the NFS v2/v3 (and their helper) protocols world
but it is not always adequate for the V4 protocol and all the work/features 
in V4 and V4.1 towards a unified, global namespace.

I'll show why in the context of V4 on the examples you
mentioned below.

>if the nfs server has the following share(s) exported:
>
>nfsserver:/vol
>
>then you would have the following mount(s):
>/var/zones/nfsmount/zone1/nfsserver/vol
>/var/zones/nfsmount/zone2/nfsserver/vol
>/var/zones/nfsmount/zone3/nfsserver/vol
>
>if the nfs server has the following share(s) exported:
>
>nfsserver:/vol/zones
>
>then you would have the following mount(s):
>/var/zones/nfsmount/zone1/nfsserver/vol/zones
>/var/zones/nfsmount/zone2/nfsserver/vol/zones
>/var/zones/nfsmount/zone3/nfsserver/vol/zones

in those 2 examples, we'd have to consider how the
V4 server constructs it's pseudo namespace starting
at the servers root, including what we call pseudo exports
that build the bridge to the real exported share points
at the server and how the V4 client may handle this.

for instance, on the V4 server the export:

/vol 

may (and probably will) have different ZFS datasets 
that host our zones underneath /vol eg.:

/vol/zone1
/vol/zone2
/vol/zone3

since they are separate ZFS datasets, we would
cross file system boundaries while traversing from
the exported servers root / over the share point /vol
down to the (also presumably exported, otherwise it
wouln't be usefull in our context anyways) share points
zone1/zone2/zone3.

We'll distingish between the different file systems
based on the FSID attribute, if it changes, we'd cross
server file system boundaries. 

With V2/V3 that'd stop us and the client can not travel into
the new file system below the inital mount and a separate mount
would have to be performed (unless we've explicitely mounted
the entire path of course)

However, with V4, the client has the (in our implementation)
so called Mirror Mount feature. That allows the client
to transparantly mount those new file systems on access below
the starting share point /vol (provided they are shared as well)
and make them immediately visible without requiring the user
for perform any additional mounts. 

Those mirror mounts will be done automatically by our V4 client
in the kernel as it detects it'd cross server side file system
boundaries (based on the FSID) on any access other then 
VOP_LOOKUP() or VOP_GETATTR().

Ie. if the global zone did already have mounted

server:/vol

an attempt by the zone utilities to access (as opposed to
explicit mounting) of

server:/vol/zone1

will automatically mount server:/vol/zone1 into
the clients namespace and you'd get on the client
(nfsstat -m) 2 mounts:

server:/vol (already existing regular mount)
server:/vol/zone1 (the mirror mount done by the client)

if we'd really perform a mount though, that'd just induce 
the mount of

server:/vol/zone1 

into the clients namespace running the zone.

With the advent of the upcomming NFS v4 Referrals support
in the V4 server and V4 client, another 'automatism' 
in the client can possibly change our observation of
the mounted server exports on the client running the zone.

On the V4 server (that is hosting our zone image) the administrator
might decide to relocate the export to a different server and
then might establish a so called 'reparse point' (in essence 
a symlink containing special infos) that will redirect a client
to a different server hosting this export. 

NB: other Vendors NFS servers might hand out referrals to
NB2: the same feature will be supported by our CIFS client

The V4 client can get a specific referral event (NFS4ERR_MOVED) 
on VOP_LOOKUP(), VOP_GETATTR() and during inital mount processing
by observing the NFS4ERR_MOVED error and it'll fetch the new location
information from the server via the 'fs_locations' attribute.
Our client will then go off and automatically mount the file system
from the different server it had been referred to from the inital
server. Again like mirror mounts, this is done transparently for the
user and inside the kernel.

The minor but important quirk involved here as far as our 
observation from the Zone NFS client is concered is that
we might get for our mount attempt (or access to)

server_A:/vol/zones1

a mount established instead for

server_B:/vol/zones1

It is planned to even provide our V2/V3 clients with
Referral support when taking to our NFS servers, although
the implementation will slightely differ and I'm not
yet sure how that V2/V3 clients referral mount will
be observed on the NFS client. 

While this (Referrals) currently only affects inital access and
mounting, in the future with Migration and Replication support being
implemented, litterally every NFS v4 OTW OP may get a 'migration
event', aka. receive NFS4ERR_MOVED. 

This is still in the early design stages, but we have to expect
that from the Zones NFS clients observability stand point
the 'nfsserver' portion of the mounted export may silently
be 're-written' behind the scenes instead of doing a
separate 2nd mount, ie.:

our inital zone initiated access/mount:

server_OLD:/vol/zone1

Oops, migration even happens to the client, now will 
silently become:

server_NEW:/vol/zone1

this will be reflected in things like nfsstat(1M) output
as well.

>if the nfs server has the following share(s) exported:
>
>nfsserver:/vol/zones/zone1
>nfsserver:/vol/zones/zone2
>nfsserver:/vol/zones/zone3
>
>then you would have the following mount(s):
>
>/var/zones/nfsmount/zone1/nfsserver/vol/zones/zone1
>/var/zones/nfsmount/zone2/nfsserver/vol/zones/zone2
>/var/zones/nfsmount/zone3/nfsserver/vol/zones/zone3

as I tried to explain above, the 'nfsserver' part
can be a moving target as far as our observability from
the Zone NFS client is concerned.

>afaik, determining the mount point should be pretty strait forward.
>i was planning to get a list of all the shares exported by the specified
>nfs server, and then do a strncmp() of all the exported shares against
>the specified path. the longest matching share name is the mount path.

Well, that in turn is anything but straight forward and almost
impossible for NFS v4 servers.

For the V2/V3 clients that do use the mount protocol to instantiate
a mount the servers mountd(1M) from a V2/V3 server can be asked by
the client using the MOUNTPROC_EXPORT/MOUNTPROC3_EXPORT RPC procedure
to return a list of exported file systems.

This is used by commands like showmount(1M) or dfshares(1M)
to list servers exported file systems, however there's no API available
todo that other then writing your own RPC aware application doing
essentially rpc_clnt_calls(3NSL) talking to a remote V2/V3 servers
mountd(1M).

But, the V4 protocol does not use the mount protocol at all anymore
so there's no real programmatic way to retrieve a list of
exported file systems from a V4 server. This would not make
much sense in the context of the V4 protocol anyways because
of the way the V4 server constructs its pseudo namespace starting
from servers root / potentially involving pseudo export nodes
that eventually bridge to the real share points.

You may be lucky and the exported file systems are shared for 
V3 and V4 in which case you can make an educated guess at least.

>for example. if we have:
>nfs://jurassic/a/b/c/d/file
>
>and jurassic is exporting:
>jurassic:/a
>jurassic:/a/b
>jurassic:/a/b/c
>
>then our mount path with be:
>/var/zones/nfsmount/jurassic/a/b/c
>
>and our encapsulated zvol will be accessible at:
>/var/zones/nfsmount/jurassic/a/b/c/d/file
>
>afaik, this is acutally the only way that this could be implemented.

for above reasons I'd rather stay away from implementing some
logic to figure out what to mount based on a potential
list of exported file systems from the server and rather stick
with some basics configured via zonecfg in the way:

NFS path = 'nfs://<host>[:port]/<export>'
Zone image = '<[dir to]filename>'

that way we avoid the problem of having to parse the entire 
current proposed SO-URI like:

'nfs://<host>[:port]/<file-absolute>'.

and probe what part of that pathname may be suitable as a mount. 

Of course we could always say that anything before the image file
name itself shall be in essence an exported path suitable for 
performing a mount.

Also when talking to V4 servers we could also always just mount
the servers root / and then an any access to the <file-absolute>
path will trigger a mirror mount, but then, this does not
work for V2/V3 servers though.

I think we may want to elaborate a bit more on the use
of the current proposed NFS SO-URI of:

'nfs://<host>[:port]/<file-absolute>'

and its use from Zone land to perform mounts and access the
zone image.

cheers
frankB
-- 
This message posted from opensolaris.org
_______________________________________________
zones-discuss mailing list
zones-discuss@opensolaris.org

Re: [zones-discuss] zones on shared storage proposal

Reply via email to