Comments in-line.
On 6/6/2010 9:16 PM, Ken wrote:
I'm looking at VMWare, ESXi 4, but I'll take any advice offered.
On Sun, Jun 6, 2010 at 19:40, Erik Trimble <erik.trim...@oracle.com
<mailto:erik.trim...@oracle.com>> wrote:
On 6/6/2010 6:22 PM, Ken wrote:
Hi,
I'm looking to build a virtualized web hosting server environment
accessing files on a hybrid storage SAN. I was looking at using
the Sun X-Fire x4540 with the following configuration:
* 6 RAID-Z vdevs with one hot spare each (all 500GB 7200RPM
SATA drives)
* 2 Intel X-25 32GB SSD's as a mirrored ZIL
* 4 Intel X-25 64GB SSD's as the L2ARC.
* De-duplification
* LZJB compression
The clients will be Apache web hosts serving hundreds of domains.
I have the following questions:
* Should I use NFS with all five VM's accessing the exports,
or one LUN for each VM, accessed over iSCSI?
Generally speaking, it depends on your comfort level with running iSCSI
Volumes to put the VMs in, or serving everything out via NFS (hosting
the VM disk file in an NFS filesystem).
If you go the iSCSI route, I would definitely go the "one iSCSI volume
per VM" route - note that you can create multiple zvols per zpool on the
X4540, so it's not limiting in any way to volume-ize a VM. It's a lot
simpler, easier, and allows for nicer management (snapshots/cloning/etc.
on the X4540 side) if you go with a VM per iSCSI volume.
With NFS-hosted VM disks, do the same thing: create a single filesystem
on the X4540 for each VM.
Performance-wise, I'd have to test, but I /think/ the iSCSI route will
be faster. Even with the ZIL SSDs.
In all cases, regardless of how you host the VM images themselves, I'd
serve out the website files via NFS. I'm not sure how ESXi works, but
under something like Solaris/Vbox, I could set up the base Solaris
system to run CacheFS for an NFS share, and then give local access to
all the VBox instances that single NFS mountpoint. That would allow for
heavy client-side cacheing of important data for your web servers. If
you're careful, you can separate read-only data from write-only data,
which would allow you even better performance tweaks. I tend to like to
have the host OS handle as much network traffic and cacheing of data as
possible instead of each VM doing it; it tends to be more efficient that
way.
* Are the FSYNC speed issues with NFS resolved?
The ZIL SSDs will compensate for synchronous write issues in NFS. Not
completely eliminate them, but you shouldn't notice issues with sync
writing until you're up at pretty heavy loads.
* Should I go with fiber channel, or will the 4 built-in 1Gbe
NIC's give me enough speed?
Depending on how much RAM and how much local data caching you do (and
the specifics of the web site accesses), 4 GBE should be fine. However,
if you want more, I'd get another quad GBE card, and then run at least 2
guest instances per client hardware. Try very hard to have the
equivalent of a full GBE available per VM. Personally, I'd go for
client hardware that has 4 GBE interfaces: (1) each for two VMs, 1 for
external internet access, and 1 for management. I'd then run the X4540
with 8 GBE bonded (trunked/teamed/whatever) together. This might be
overkill, so see what your setup requires in terms of available bandwidth.
* How many SSD's should I use for the ZIL and L2ARC?
Being a website mux, your data pattern is likely to be 99% read with
small random writes being the remaining 1%. You need just enough
high-performance SSD for the ZIL. Honestly, the 32GB X25-E is larger
than you'll likely ever need. I can't recommend anything else for the
money, but the sad truth is that ZFS really only need a 1-2GB of NVRAM
for the ZIL (for most use cases). So get the smallest device you can
find that still satisfies the high performance requirement. Caveaut:
look at the archives for all the talk about protecting your ZIL device
from power outages (and the lack of a capacitor in most modern SSDs).
For L2ARC, go big. Website files tend to be /very/ small, so you're in
the worst use case for Dedup. With something like a X4540 and it's huge
data capacity, get as much L2ARC SSD space as you can afford. Remember:
250bytes per Dedup block. If you have 1k blocks for all those little
files, well, your L2ARC needs to be 25% of your data size. *Ouch* Now,
you don't have to buy the super-expensive stuff for L2ARC: the good old
Intel X-25M works just fine. Don't mirror them.
Given the explosive potential size of your DDT, I'd think long and hard
about which data you really want to Dedup. Disk is cheap, but SSD
isn't. Good news is that you can selectively decide which data sets to
Dedup. Ain't ZFS great?
* What pool structure should I use?
If it were me (and, given what little I know of your data), I'd go like
this:
(1) pool for VMs:
8 disks, MIRRORED
1 SSD for L2ARC
one Zvol per VM instance, served via iSCSI, each with:
DD turned ON, Compression turned OFF
(1) pool for clients to write data to (log files, incoming data, etc.)
6 or 8 disks, MIRRORED
2 SSDs for ZIL, mirrored
Ideally, As many filesystems as you have webSITES, not just
client VMs. As this might be unwieldy for 100s of websites, you should
segregate them into obvious groupings, taking care with write/read
permissions.
NFS served
DD OFF, Compression ON (or OFF, if you seem to be
having CPU overload on the X4540)
(1) pool for client read-only data
All the rest of the disks, split into 7 or 8-disk RAIDZ2 vdevs
All the remaining SSDs for L2ARC
As many filesystems as you have webSITES, not just client VMs.
(however, see above)
NFS served
DD on for selected websites (filesystems), Compression
ON for everything
(2) Global hot spares.
I know these questions are slightly vague, but any input would be
greatly appreciated.
Thanks!
Which Virtual Machine technology are you going to use?
VirtualBox
VMWare
Xen
Solaris Zones
Somethinge else...
It will make a difference as to my recommendation (or, do you want
me to recommend a VM type, too?)
<grin>
--
Erik Trimble
Java System Support
Mailstop: usca22-123
Phone: x17195
Santa Clara, CA
--
Erik Trimble
Java System Support
Mailstop: usca22-123
Phone: x17195
Santa Clara, CA
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss