subject:"\[zfs\-discuss\] Is this a sensible spec for an iSCSI storage box\?"

Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?

2010-03-19 Thread Scott Meilicke

> One of the reasons I am investigating solaris for
> this is sparse volumes and dedupe could really help
> here.  Currently we use direct attached storage on
> the dom0s and allocate an LVM to the domU on
> creation.  Just like your example above, we have lots
> of those "80G to start with please" volumes with 10's
> of GB unused.  I also think this data set would
> dedupe quite well since there are a great many
> identical OS files across the domUs.  Is that
> assumption correct?

This is one reason I like NFS - thin by default, and no wasted space within a 
zvol. zvols can be thin as well, but opensolaris will not know the inside 
format of the zvol, and you may still have a lot of wasted space after a while 
as files inside of the zvol come and go. In theory dedupe should work well, but 
I would be careful about a possible speed hit. 


> I've not seen an example of that before.  Do you mean
> having two 'head units' connected to an external JBOD
> enclosure or a proper HA cluster type configuration
> where the entire thing, disks and all, are
> duplicated?

I have not done any type of cluster work myself, but from what I have read on 
Sun's site, yes, you could connect the same jbod to two head units, 
active/passive, in an HA cluster, but no duplicate disks/jbod. When the active 
goes down, passive detects this and takes over the pool by doing an import. 
During the import, any outstanding transactions on the zil are replayed, 
whether they are on a slog or not. I believe this is how Sun does it on their 
open storage boxes (7000 series). Note - two jbods could be used, one for each 
head unit, making an active/active setup. Each jbod is active on one node, 
passive on the other.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?

2010-03-19 Thread Matt

> You do not need to mirror the L2ARC devices, as the
> system will just hit disk as necessary. Mirroring
> sounds like a good idea on the SLOG, but this has
> been much discussed on the forums.

Ah, ok.

> Interesting. I find IOPS is more proportional to the
> number of VMs vs disk space. 
> 
> User: I need a VM that will consume up to 80G in two
> years, so give me an 80G disk.
> Me: OK, but recall we can expand disks and
> filesystems on the fly, without downtime.
> User: Well, that is cool, but 80G to start with
> please.
> Me:  

One of the reasons I am investigating solaris for this is sparse volumes and 
dedupe could really help here.  Currently we use direct attached storage on the 
dom0s and allocate an LVM to the domU on creation.  Just like your example 
above, we have lots of those "80G to start with please" volumes with 10's of GB 
unused.  I also think this data set would dedupe quite well since there are a 
great many identical OS files across the domUs.  Is that assumption correct?

> I also believe the SLOG and L2ARC will make using
> high RPM disks not as necessary. But, from what I
> have read, higher RPM disks will greatly help with
> scrubs and reslivers. Maybe two pools - one with fast
> mirrored SAS, another with big SATA. Or all SATA, but
> one pool with mirrors, another with raidz2. Many
> options. But measure to see what works for you.
> iometer is great for that, I find. 

Yes.  As part of testing this I had planned to look at the performance of the 
config and try some other options too, such as using a volume of 2 x mirrors.  
Its a classic case of balancing performance, cost and redundancy/time to 
resilver.
 
> One of the benefits of a SLOG on the SAS/SATA bus is
> for a cluster. If one node goes down, the other can
> bring up the pool, check the ZIL for any necessary
> transactions, and apply them. To do this with battery
> backed cache, you would need fancy interconnects
> between the nodes, cache mirroring, etc. All of those
> things that SAN array products do. 

I've not seen an example of that before.  Do you mean having two 'head units' 
connected to an external JBOD enclosure or a proper HA cluster type 
configuration where the entire thing, disks and all, are duplicated?

Matt.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?

2010-03-18 Thread Scott Meilicke

>I was planning to mirror them - mainly in the hope that I could hot swap a new 
>one in the event that an existing one started to degrade. I suppose I could 
>start with one of each and convert to a mirror later although the prospect of 
>losing either disk fills me with dread.

You do not need to mirror the L2ARC devices, as the system will just hit disk 
as necessary. Mirroring sounds like a good idea on the SLOG, but this has been 
much discussed on the forums.

>> Why not larger capacity disks?

>We will run out of iops before we run out of space.

Interesting. I find IOPS is more proportional to the number of VMs vs disk 
space. 

User: I need a VM that will consume up to 80G in two years, so give me an 80G 
disk.
Me: OK, but recall we can expand disks and filesystems on the fly, without 
downtime.
User: Well, that is cool, but 80G to start with please.
Me:  

I also believe the SLOG and L2ARC will make using high RPM disks not as 
necessary. But, from what I have read, higher RPM disks will greatly help with 
scrubs and reslivers. Maybe two pools - one with fast mirrored SAS, another 
with big SATA. Or all SATA, but one pool with mirrors, another with raidz2. 
Many options. But measure to see what works for you. iometer is great for that, 
I find. 

>Any opinions on the use of battery backed SAS adapters?

Surely these will help with performance in write back mode, but I have not done 
any hard measurements. Anecdotally my PERC5i in a Dell 2950 seemed to greatly 
help with IOPS on a five disk raidz. There are pros and cons. Search the 
forums, but off the top of my head 1) SLOGs are much larger than controller 
caches: 2) only synced write activity is cached in a ZIL, whereas a controller 
cache will cache everything, needed or not, thus running out of space sooner; 
3) SLOGS and L2ARC devices are specialized caches for read and write loads, vs. 
the all in one cache of a controller. 4) A controller *may* be faster, since it 
uses ram for the cache.

One of the benefits of a SLOG on the SAS/SATA bus is for a cluster. If one node 
goes down, the other can bring up the pool, check the ZIL for any necessary 
transactions, and apply them. To do this with battery backed cache, you would 
need fancy interconnects between the nodes, cache mirroring, etc. All of those 
things that SAN array products do. 

Sounds like you have a fun project.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?

2010-03-18 Thread Matt

> It is hard, as you note, to recommend a box without
> knowing the load. How many linux boxes are you
> talking about?

This box will act as a backing store for a cluster of 3 or 4 XenServers with 
upwards of 50 VMs running at any one time.

> Will you mirror your SLOG, or load balance them? I
> ask because perhaps one will be enough, IO wise. My
> box has one SLOG (X25-E) and can support about 2600
> IOPS using an iometer profile that closely
> approximates my work load. My ~100 VMs on 8 ESX boxes
> average around 1000 IOPS, but can peak 2-3x that
> during backups.

I was planning to mirror them - mainly in the hope that I could hot swap a new 
one in the event that an existing one started to degrade.  I suppose I could 
start with one of each and convert to a mirror later although the prospect of 
losing either disk fills me with dread.
 
> Don't discount NFS. I absolutely love NFS for
> management and thin provisioning reasons. Much easier
> (to me) than managing iSCSI, and performance is
> similar. I highly recommend load testing both iSCSI
> and NFS before you go live. Crash consistent backups
> of your VMs are possible using NFS, and recovering a
> VM from a snapshot is a little easier using NFS, I
> find.

That's interesting feedback.  Given how easy it is to create NFS and iSCSI 
shares in osol, I'll definitely try both and see how they compare.
 
> Why not larger capacity disks?

We will run out of iops before we run out of space.  Its is more likely that we 
will gradually replace some of the SATA drives with 6gbps SAS drives to help 
with that and we've been mulling over using an LSI SAS 9211-8i controller to 
provide that upgrade path:

http://www.lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/internal/sas9211-8i/index.html

> Hopefully your switches support NIC aggregation?

Yes, we're hoping that a bond of 4 x NICs will cope.

Any opinions on the use of battery backed SAS adapters? - it also occurred to 
me after writing this that perhaps we could use one and configure it to report 
writes as being flushed to disk before they actually were.  That might give a 
slight edge in performance in some cases but I would prefer to have the data 
security instead, tbh.

Matt.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?

2010-03-18 Thread Scott Meilicke

It is hard, as you note, to recommend a box without knowing the load. How many 
linux boxes are you talking about?

I think having a lot of space for your L2ARC is a great idea.

Will you mirror your SLOG, or load balance them? I ask because perhaps one will 
be enough, IO wise. My box has one SLOG (X25-E) and can support about 2600 IOPS 
using an iometer profile that closely approximates my work load. My ~100 VMs on 
8 ESX boxes average around 1000 IOPS, but can peak 2-3x that during backups.

Don't discount NFS. I absolutely love NFS for management and thin provisioning 
reasons. Much easier (to me) than managing iSCSI, and performance is similar. I 
highly recommend load testing both iSCSI and NFS before you go live. Crash 
consistent backups of your VMs are possible using NFS, and recovering a VM from 
a snapshot is a little easier using NFS, I find.

Why not larger capacity disks?

Hopefully your switches support NIC aggregation?

The only issue I have had on 2009.06 using iSCSI (I had a windows VM directly 
attaching to an iSCSI 4T volume) was solved and back ported to 2009.06 (bug 
6794994).

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?

Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?

Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?

Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?

Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?

5 matches

Site Navigation

Mail list logo

Footer information