Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?
> One of the reasons I am investigating solaris for > this is sparse volumes and dedupe could really help > here. Currently we use direct attached storage on > the dom0s and allocate an LVM to the domU on > creation. Just like your example above, we have lots > of those "80G to start with please" volumes with 10's > of GB unused. I also think this data set would > dedupe quite well since there are a great many > identical OS files across the domUs. Is that > assumption correct? This is one reason I like NFS - thin by default, and no wasted space within a zvol. zvols can be thin as well, but opensolaris will not know the inside format of the zvol, and you may still have a lot of wasted space after a while as files inside of the zvol come and go. In theory dedupe should work well, but I would be careful about a possible speed hit. > I've not seen an example of that before. Do you mean > having two 'head units' connected to an external JBOD > enclosure or a proper HA cluster type configuration > where the entire thing, disks and all, are > duplicated? I have not done any type of cluster work myself, but from what I have read on Sun's site, yes, you could connect the same jbod to two head units, active/passive, in an HA cluster, but no duplicate disks/jbod. When the active goes down, passive detects this and takes over the pool by doing an import. During the import, any outstanding transactions on the zil are replayed, whether they are on a slog or not. I believe this is how Sun does it on their open storage boxes (7000 series). Note - two jbods could be used, one for each head unit, making an active/active setup. Each jbod is active on one node, passive on the other. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?
> You do not need to mirror the L2ARC devices, as the > system will just hit disk as necessary. Mirroring > sounds like a good idea on the SLOG, but this has > been much discussed on the forums. Ah, ok. > Interesting. I find IOPS is more proportional to the > number of VMs vs disk space. > > User: I need a VM that will consume up to 80G in two > years, so give me an 80G disk. > Me: OK, but recall we can expand disks and > filesystems on the fly, without downtime. > User: Well, that is cool, but 80G to start with > please. > Me: One of the reasons I am investigating solaris for this is sparse volumes and dedupe could really help here. Currently we use direct attached storage on the dom0s and allocate an LVM to the domU on creation. Just like your example above, we have lots of those "80G to start with please" volumes with 10's of GB unused. I also think this data set would dedupe quite well since there are a great many identical OS files across the domUs. Is that assumption correct? > I also believe the SLOG and L2ARC will make using > high RPM disks not as necessary. But, from what I > have read, higher RPM disks will greatly help with > scrubs and reslivers. Maybe two pools - one with fast > mirrored SAS, another with big SATA. Or all SATA, but > one pool with mirrors, another with raidz2. Many > options. But measure to see what works for you. > iometer is great for that, I find. Yes. As part of testing this I had planned to look at the performance of the config and try some other options too, such as using a volume of 2 x mirrors. Its a classic case of balancing performance, cost and redundancy/time to resilver. > One of the benefits of a SLOG on the SAS/SATA bus is > for a cluster. If one node goes down, the other can > bring up the pool, check the ZIL for any necessary > transactions, and apply them. To do this with battery > backed cache, you would need fancy interconnects > between the nodes, cache mirroring, etc. All of those > things that SAN array products do. I've not seen an example of that before. Do you mean having two 'head units' connected to an external JBOD enclosure or a proper HA cluster type configuration where the entire thing, disks and all, are duplicated? Matt. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?
>I was planning to mirror them - mainly in the hope that I could hot swap a new >one in the event that an existing one started to degrade. I suppose I could >start with one of each and convert to a mirror later although the prospect of >losing either disk fills me with dread. You do not need to mirror the L2ARC devices, as the system will just hit disk as necessary. Mirroring sounds like a good idea on the SLOG, but this has been much discussed on the forums. >> Why not larger capacity disks? >We will run out of iops before we run out of space. Interesting. I find IOPS is more proportional to the number of VMs vs disk space. User: I need a VM that will consume up to 80G in two years, so give me an 80G disk. Me: OK, but recall we can expand disks and filesystems on the fly, without downtime. User: Well, that is cool, but 80G to start with please. Me: I also believe the SLOG and L2ARC will make using high RPM disks not as necessary. But, from what I have read, higher RPM disks will greatly help with scrubs and reslivers. Maybe two pools - one with fast mirrored SAS, another with big SATA. Or all SATA, but one pool with mirrors, another with raidz2. Many options. But measure to see what works for you. iometer is great for that, I find. >Any opinions on the use of battery backed SAS adapters? Surely these will help with performance in write back mode, but I have not done any hard measurements. Anecdotally my PERC5i in a Dell 2950 seemed to greatly help with IOPS on a five disk raidz. There are pros and cons. Search the forums, but off the top of my head 1) SLOGs are much larger than controller caches: 2) only synced write activity is cached in a ZIL, whereas a controller cache will cache everything, needed or not, thus running out of space sooner; 3) SLOGS and L2ARC devices are specialized caches for read and write loads, vs. the all in one cache of a controller. 4) A controller *may* be faster, since it uses ram for the cache. One of the benefits of a SLOG on the SAS/SATA bus is for a cluster. If one node goes down, the other can bring up the pool, check the ZIL for any necessary transactions, and apply them. To do this with battery backed cache, you would need fancy interconnects between the nodes, cache mirroring, etc. All of those things that SAN array products do. Sounds like you have a fun project. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?
> It is hard, as you note, to recommend a box without > knowing the load. How many linux boxes are you > talking about? This box will act as a backing store for a cluster of 3 or 4 XenServers with upwards of 50 VMs running at any one time. > Will you mirror your SLOG, or load balance them? I > ask because perhaps one will be enough, IO wise. My > box has one SLOG (X25-E) and can support about 2600 > IOPS using an iometer profile that closely > approximates my work load. My ~100 VMs on 8 ESX boxes > average around 1000 IOPS, but can peak 2-3x that > during backups. I was planning to mirror them - mainly in the hope that I could hot swap a new one in the event that an existing one started to degrade. I suppose I could start with one of each and convert to a mirror later although the prospect of losing either disk fills me with dread. > Don't discount NFS. I absolutely love NFS for > management and thin provisioning reasons. Much easier > (to me) than managing iSCSI, and performance is > similar. I highly recommend load testing both iSCSI > and NFS before you go live. Crash consistent backups > of your VMs are possible using NFS, and recovering a > VM from a snapshot is a little easier using NFS, I > find. That's interesting feedback. Given how easy it is to create NFS and iSCSI shares in osol, I'll definitely try both and see how they compare. > Why not larger capacity disks? We will run out of iops before we run out of space. Its is more likely that we will gradually replace some of the SATA drives with 6gbps SAS drives to help with that and we've been mulling over using an LSI SAS 9211-8i controller to provide that upgrade path: http://www.lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/internal/sas9211-8i/index.html > Hopefully your switches support NIC aggregation? Yes, we're hoping that a bond of 4 x NICs will cope. Any opinions on the use of battery backed SAS adapters? - it also occurred to me after writing this that perhaps we could use one and configure it to report writes as being flushed to disk before they actually were. That might give a slight edge in performance in some cases but I would prefer to have the data security instead, tbh. Matt. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?
It is hard, as you note, to recommend a box without knowing the load. How many linux boxes are you talking about? I think having a lot of space for your L2ARC is a great idea. Will you mirror your SLOG, or load balance them? I ask because perhaps one will be enough, IO wise. My box has one SLOG (X25-E) and can support about 2600 IOPS using an iometer profile that closely approximates my work load. My ~100 VMs on 8 ESX boxes average around 1000 IOPS, but can peak 2-3x that during backups. Don't discount NFS. I absolutely love NFS for management and thin provisioning reasons. Much easier (to me) than managing iSCSI, and performance is similar. I highly recommend load testing both iSCSI and NFS before you go live. Crash consistent backups of your VMs are possible using NFS, and recovering a VM from a snapshot is a little easier using NFS, I find. Why not larger capacity disks? Hopefully your switches support NIC aggregation? The only issue I have had on 2009.06 using iSCSI (I had a windows VM directly attaching to an iSCSI 4T volume) was solved and back ported to 2009.06 (bug 6794994). -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss