Re: [zfs-discuss] Wanted: sanity check for a clustered ZFS idea

Jim Klimov Fri, 14 Oct 2011 05:38:18 -0700

2011-10-14 15:53, Edward Ned Harvey пишет:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Jim Klimov


I guess Richard was correct about the usecase description -
I should detail what I'm thinking about, to give some illustration.

After reading all this, I'm still unclear on what you want to accomplish, that 
isn't already done today.  Yes I understand what it means when we say ZFS is 
not a clustering filesystem, and yes I understand what benefits there would be 
to gain if it were a clustering FS.  But in all of what you're saying below, I 
don't see that you need a clustering FS.


In my example - probably not a completely clustered FS.
A clustered ZFS pool with datasets individually owned by
specific nodes at any given time would suffice for such
VM farms. This would give users the benefits of ZFS
(resilience, snapshots and clones, shared free space)
merged with the speed of direct disk access instead of
lagging through a storage server accessing these disks.

This is why I think such a solution may be more simple
than a fully-fledged POSIX-compliant shared FS, but it
would still have some benefits for specific - and popular -
usage cases. And it might pave way for a more complete
solution - or perhaps illustrate what should not be done
for those solutions ;)

After all, I think that if the problem of safe multiple-node
RW access to ZFS gets fundamentally solved, these
usages I described before might just become a couple
of new dataset types with specific predefined usage
and limitations - like POSIX-compliant FS datasets
and block-based volumes are now defined over ZFS.
There is no reason not to call them "clustered FS and
clustered volume datasets", for example ;)

AFAIK, VMFS is not a generic filesystem, and cannot
quite be used "directly" by software applications, but it
has its target market for shared VM farming...

I do not know how they solve the problems of consistency
control - with master nodes or something else, and for
the sake of patent un-encroaching, I'm afraid I'd rather
not know - as to not copycat someone's solution and
get burnt for that ;)

of these deployments become VMWare ESX farms with shared
VMFS. Due to my stronger love for things Solaris, I would love
to see ZFS and any of Solaris-based hypervisors (VBox, Xen
or KVM ports) running there instead. But for things to be as
efficient, ZFS would have to become shared - clustered...

I think the solution people currently use in this area is either NFS or iscsi.  
(Or infiniband, and other flavors.)  You have a storage server presenting the 
storage to the various vmware (or whatever) hypervisors.


In fact, no. Based on the MFSYS model, there is no storage server.
There is a built-in storage controller which can do RAID over HDDs
and represent SCSI LUNs to the blades over direct SAS access.
These LUNs can be accessed individually by certain servers, or
concurrently. In the latter case it is possible that servers take turns
mounting the LUN as a HDD with some single-server FS, or use
a clustered FS to use the LUN's disk space simultaneously.

If we were to use in this system an OpenSolaris-based OS and
VirtualBox/Xen/KVM as they are now, and hope for live migration
of VMs without copying of data, we would have to make a separate
LUN for each VM on the controller, and mount/import this LUN to
its current running host. I don't need to explain why that would be
a clumsy and unflexible solution for a near-infinite number of
reasons, do i? ;)

  Everything works.  What's missing?  And why does this need to be a clustering 
FS?

To be clearer, I should say that modern VM hypervisors can
migrate running virtual machines between two VM hosts.

This works on NFS/iscsi/IB as well.  Doesn't need a clustering FS.

Except that the storage controller doesn't do NFS/iscsi/IB,
and doesn't do snapshots and clones. And if I were to
dedicate one or two out of six blades to storage tasks,
this might be considered an improper waste of resources.
And would repackage SAS access (anyway available to
all blades at full bandwidth) into NFS/iscsi access over a
Gbit link...

With clustered VMFS on shared storage, VMWare can
migrate VMs faster - it knows not to copy the HDD image
file in vain - it will be equally available to the "new host"
at the correct point in migration, just as it was accessible
to the "old host".

Again.  NFS/iscsi/IB = ok.


True, except that this is not an optimal solution in this described
usecase - a farm of server blades with a relatively dumb fast raw
storage (but NOT an intellectual storage server).

//Jim

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Wanted: sanity check for a clustered ZFS idea

Reply via email to