Comments in-line.

On 6/6/2010 9:16 PM, Ken wrote:
I'm looking at VMWare, ESXi 4, but I'll take any advice offered.

On Sun, Jun 6, 2010 at 19:40, Erik Trimble <erik.trim...@oracle.com <mailto:erik.trim...@oracle.com>> wrote:

    On 6/6/2010 6:22 PM, Ken wrote:
    Hi,

    I'm looking to build a virtualized web hosting server environment
    accessing files on a hybrid storage SAN.  I was looking at using
    the Sun X-Fire x4540 with the following configuration:

        * 6 RAID-Z vdevs with one hot spare each (all 500GB 7200RPM
          SATA drives)
        * 2 Intel X-25 32GB SSD's as a mirrored ZIL
        * 4 Intel X-25 64GB SSD's as the L2ARC.
        * De-duplification
        * LZJB compression

    The clients will be Apache web hosts serving hundreds of domains.

    I have the following questions:

        * Should I use NFS with all five VM's accessing the exports,
          or one LUN for each VM, accessed over iSCSI?

Generally speaking, it depends on your comfort level with running iSCSI Volumes to put the VMs in, or serving everything out via NFS (hosting the VM disk file in an NFS filesystem).

If you go the iSCSI route, I would definitely go the "one iSCSI volume per VM" route - note that you can create multiple zvols per zpool on the X4540, so it's not limiting in any way to volume-ize a VM. It's a lot simpler, easier, and allows for nicer management (snapshots/cloning/etc. on the X4540 side) if you go with a VM per iSCSI volume.

With NFS-hosted VM disks, do the same thing: create a single filesystem on the X4540 for each VM.

Performance-wise, I'd have to test, but I /think/ the iSCSI route will be faster. Even with the ZIL SSDs.

In all cases, regardless of how you host the VM images themselves, I'd serve out the website files via NFS. I'm not sure how ESXi works, but under something like Solaris/Vbox, I could set up the base Solaris system to run CacheFS for an NFS share, and then give local access to all the VBox instances that single NFS mountpoint. That would allow for heavy client-side cacheing of important data for your web servers. If you're careful, you can separate read-only data from write-only data, which would allow you even better performance tweaks. I tend to like to have the host OS handle as much network traffic and cacheing of data as possible instead of each VM doing it; it tends to be more efficient that way.


        * Are the FSYNC speed issues with NFS resolved?

The ZIL SSDs will compensate for synchronous write issues in NFS. Not completely eliminate them, but you shouldn't notice issues with sync writing until you're up at pretty heavy loads.

        * Should I go with fiber channel, or will the 4 built-in 1Gbe
          NIC's give me enough speed?

Depending on how much RAM and how much local data caching you do (and the specifics of the web site accesses), 4 GBE should be fine. However, if you want more, I'd get another quad GBE card, and then run at least 2 guest instances per client hardware. Try very hard to have the equivalent of a full GBE available per VM. Personally, I'd go for client hardware that has 4 GBE interfaces: (1) each for two VMs, 1 for external internet access, and 1 for management. I'd then run the X4540 with 8 GBE bonded (trunked/teamed/whatever) together. This might be overkill, so see what your setup requires in terms of available bandwidth.

        * How many SSD's should I use for the ZIL and L2ARC?

Being a website mux, your data pattern is likely to be 99% read with small random writes being the remaining 1%. You need just enough high-performance SSD for the ZIL. Honestly, the 32GB X25-E is larger than you'll likely ever need. I can't recommend anything else for the money, but the sad truth is that ZFS really only need a 1-2GB of NVRAM for the ZIL (for most use cases). So get the smallest device you can find that still satisfies the high performance requirement. Caveaut: look at the archives for all the talk about protecting your ZIL device from power outages (and the lack of a capacitor in most modern SSDs).

For L2ARC, go big. Website files tend to be /very/ small, so you're in the worst use case for Dedup. With something like a X4540 and it's huge data capacity, get as much L2ARC SSD space as you can afford. Remember: 250bytes per Dedup block. If you have 1k blocks for all those little files, well, your L2ARC needs to be 25% of your data size. *Ouch* Now, you don't have to buy the super-expensive stuff for L2ARC: the good old Intel X-25M works just fine. Don't mirror them.

Given the explosive potential size of your DDT, I'd think long and hard about which data you really want to Dedup. Disk is cheap, but SSD isn't. Good news is that you can selectively decide which data sets to Dedup. Ain't ZFS great?


        * What pool structure should I use?

If it were me (and, given what little I know of your data), I'd go like this:

(1) pool for VMs:
        8 disks, MIRRORED
        1 SSD for L2ARC
        one Zvol per VM instance, served via iSCSI, each with:
                DD turned ON,  Compression turned OFF

(1) pool for clients to write data to (log files, incoming data, etc.)
        6 or 8 disks, MIRRORED
        2 SSDs for ZIL, mirrored
Ideally, As many filesystems as you have webSITES, not just client VMs. As this might be unwieldy for 100s of websites, you should segregate them into obvious groupings, taking care with write/read permissions.
                NFS served
DD OFF, Compression ON (or OFF, if you seem to be having CPU overload on the X4540)

(1) pool for client read-only data
        All the rest of the disks, split into 7 or 8-disk RAIDZ2 vdevs
        All the remaining SSDs for L2ARC
As many filesystems as you have webSITES, not just client VMs. (however, see above)
                NFS served
DD on for selected websites (filesystems), Compression ON for everything

(2) Global hot spares.


    I know these questions are slightly vague, but any input would be
    greatly appreciated.

    Thanks!


    Which Virtual Machine technology are you going to use?

    VirtualBox
    VMWare
    Xen
    Solaris Zones
    Somethinge else...

    It will make a difference as to my recommendation (or, do you want
    me to recommend a VM type, too?)

    <grin>



-- Erik Trimble
    Java System Support
    Mailstop:  usca22-123
    Phone:  x17195
    Santa Clara, CA




--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to