Re: [zfs-discuss] Homegrown Hybrid Storage

Miles Nordin Mon, 07 Jun 2010 11:07:33 -0700

>>>>> "et" == Erik Trimble <erik.trim...@oracle.com> writes:


    et> With NFS-hosted VM disks, do the same thing: create a single
    et> filesystem on the X4540 for each VM.

previous posters pointed out there are unreasonable hard limits in
vmware to the number of NFS mounts or iSCSI connections or something,
so you will probably run into that snag when attempting to use the
much faster snapshotting/cloning in ZFS.

    >>> * Are the FSYNC speed issues with NFS resolved?
    >>> 
    et> The ZIL SSDs will compensate for synchronous write issues in
    et> NFS.

okay, but sometimes for VM's I think this often doesn't matter because
NFSv3 and v4 only add fsync()'s on file closings, and a virtual disk
is one giant file that the client never closes.  There may still be
synchronous writes coming through if they don't get blocked in LVM2
inside the guest or blocked in the VM software, but whatever comes
through ought to be exactly the same number of them for NFS or iSCSI,
unless the vm software has different bugs in the nfs vs iscsi
back-ends.

the other difference is in the latest comstar which runs in
sync-everything mode by default, AIUI.  Or it does use that mode only
when zvol-backed?  Or something.  I've the impression it went through
many rounds of quiet changes, both in comstar and in zvol's, on its
way to its present form.  I've heard said here you can change the mode
both from the comstar host and on the remote initiator, but I don't
know how to do it or how sticky the change is, but if you didn't
change and stuck with the default sync-everything I think NFS would be
a lot faster.  This is if we are comparing one giant .vmdk or similar
on NFS, against one zvol.  If we are comparing an exploded filesystem
on NFS mounted through the virtual network adapter, then of course
you're right again Erik.

The tradeoff integrity tests are, (1) reboot the solaris storage host
without rebooting the vmware hosts & guests and see what happens, (2)
cord-yank the vmware host.  Both of these are probably more dangerous
than (3) command the vm software to virtual-cord-yank the guest.

    >>> * Should I go with fiber channel, or will the 4 built-in 1Gbe
    >>> NIC's give me enough speed?

FC has different QoS properties than Ethernet because of the buffer
credit mechanism---it can exert back-pressure all the way through the
fabric.  same with IB, which is HOL-blocking.  This is a big deal with
storage, with its large blocks of bursty writes that aren't really the
case for which TCP shines.  I would try both and compare, if you can
afford it!

    je> IMHO Solaris Zones with LOFS mounted ZFSs gives you the
    je> highest flexibility in all directions, probably the best
    je> performance and least resource consumption, fine grained
    je> resource management (CPU, memory, storage space) and less
    je> maintainance stress etc...

yeah zones are really awesome, especially combined with clones and
snapshots.  For once the clunky post-Unix XML crappo solaris
interfaces are actually something I appreciate a little, because lots
of their value comes from being able to do consistent repeatable
operations on them.

The problem is that the zones run Solaris instead of Linux.  BrandZ
never got far enough to, for example, run Apache under a
2.6-kernel-based distribution, so I don't find it useful for any real
work.  I do keep a CentOS 3.8 (I think?) brandz zone around, but not
for anything production---just so I can try it if I think the
new/weird version of a tool might be broken.

as for native zones, the ipkg repository, and even the jucr
repository, has two years old versions of everything---django/python,
gcc, movabletype.  Many things are missing outright, like nginx.  I'm
very disappointed that Solaris did not adopt an upstream package
system like Dragonfly did.  Gentoo or pkgsrc would have been very
smart, IMHO.  Even opencsw is based on Nick Moffitt's GAR system,
which was an old mostly-abandoned tool for building bleeding edge
Gnome on Linux.  The ancient perpetually-abandoned set of packages on
jucr and the crufty poorly-factored RPM-like spec files leave me with
little interest in contributing to jucr myself, while if Solaris had
poured the effort instead into one of these already-portable package
systems like they poured it into Mercurial after adopting that, then
I'd instead look into (a) contributing packages that I need most, and
(b) using whatever system Solaris picked on my non-Solaris systems.
This crap/marginalized build system means I need to look at a way to
host Linux under Solaris, using Solaris basically just for ZFS and
nothing else.  The alternative is to spend heaps of time re-inventing
the wheel only to end up with an environment less rich than
competitors and charge twice as much for it like joyent.

But, yeah, while working on Solaris I would never install anything in
the global zone after discovering how easy it is to work with ipkg
zones.  They are really brilliant, and unlike everyone else's attempt
at these superchroot's like freebsd jails/johncompanies.com I feel
like zones are basically finished.

however...  because of:

  http://mail.opensolaris.org/pipermail/zfs-discuss/2009-October/032878.html

I wonder if it might be better to mount ZFS datasets directly in the
zones, not lofs mount them.  It's easy to do this.  Short version is:

 1. create dataset outside the zone with mountpoint=none
 2. add dataset to the zone with zonecfg
 3. set the dataset's mountpoint from a shell inside the zone

Long version below.

postgres cheatsheet:
-----8<-----
http://blogs.sun.com/jkshah/entry/opensolaris_2008_11_and_postgresql

need to make a dataset outside the zbe for postgres data so it'll escape beadm 
snapshotting/cloning
once that's working within zones for image-update.  setting mountpoints for 
zoned datasets is
weird, though: 
http://mail.opensolaris.org/pipermail/zones-discuss/2009-January/004661.html

outside the zone:
 zfs list -r tub/export/zone
  NAME                                                USED  AVAIL  REFER  
MOUNTPOINT
  tub/export/zone                                    27.1G   335G  40.3K  
/export/zone
  tub/export/zone/awabagal                            917M   335G  37.4K  
/export/zone/awabagal
  tub/export/zone/awabagal/ROOT                       917M   335G  31.4K  legacy
  tub/export/zone/awabagal/ROOT/zbe                   917M   335G  2.72G  legacy
 zfs create -o mountpoint=none tub/export/zone/awabagal/postgres-data
 zonecfg -z awabagal
 zonecfg:awabagal> add dataset
 zonecfg:awabagal:dataset> set name=tub/export/zone/awabagal/postgres-data
 zonecfg:awabagal:dataset> end
 zonecfg:awabagal> commit
 zonecfg:awabagal> exit
inside the zone:
 zfs list
  NAME                                     USED  AVAIL  REFER  MOUNTPOINT
  tub                                     1.33T   335G   498K  /tub
  tub/export                               295G   335G  63.2M  /export
  tub/export/zone                         27.1G   335G  40.3K  /export/zone
  tub/export/zone/awabagal                 919M   335G  37.4K  
/export/zone/awabagal
  tub/export/zone/awabagal/ROOT            919M   335G  31.4K  legacy
  tub/export/zone/awabagal/ROOT/zbe        919M   335G  2.73G  legacy
  tub/export/zone/awabagal/postgres-data  31.4K   335G  31.4K  none
 zfs set mountpoint=/var/postgres tub/export/zone/awabagal/postgres-data

the /var/postgres directory is magical and hardcoded into the package.

the rest, you do inside the zone:
pkg install SUNWpostgr-83-server SUNWpostgr-83-client SUNWpostgr-jdbc 
SUNWpostgr-83-contrib SUNWpostgr-83-docs \
 SUNWpostgr-83-devel SUNWpostgr-83-tcl SUNWpostgr-83-pl SUNWpgadmin3
svccfg import /var/svc/manifest/application/database/postgresql_83.xml
svcadm enable postgresql_83:default_64bit

add /usr/postgres/8.3/bin to {,SU}PATH in /etc/default/{login,su}
-----8<-----

pgpbWmnHKhmIf.pgp
Description: PGP signature

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Homegrown Hybrid Storage

Reply via email to