[zfs-discuss] ZFS configuration suggestion with 24 drives

2010-01-28 Thread Ed Fang
Replacing my current media server with another larger capacity media server.   
Also switching over to solaris/zfs.

Anyhow we have 24 drive capacity.  These are for large sequential access (large 
media files) used by no more than 3 or 5 users at a time.  I'm inquiring as to 
what the best configuration for this is for vdevs.  I'm considering the 
following configurations

4 x x6 vdevs in RaidZ1 configuration
3 x x8 vdevs in RaidZ2 configuration

Obviously if a drive fails, it'll take a good several days to resilver.  The 
data is important but not critical.  Using raidz1 allows you one drive failure, 
but my understanding is that if the zpool has four vdevs using raidz1, then any 
single vdev failure of more than one drive may fail the entire zpool   If 
that is the case, then it sounds better to consider 3 x8 with raidz2.  

Am I on the right track here ?  Thanks
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration suggestion with 24 drives

2010-01-28 Thread Thomas Burgess
if a vdev fails you loose the pool.

if you go with raidz1 and 2 of the RIGHT drives fail (2 in the same vdev)
your pool is lost.

I was faced with a similar situation recently and decided that raidz2 was
the better option.

It's comes down to resilver timesif you look at how long it will take to
replace a failed drive then look at the likelyhood of a drive failing
durring that process then raidz1 is much less attractive.


On Thu, Jan 28, 2010 at 10:26 AM, Ed Fang  wrote:

> Replacing my current media server with another larger capacity media
> server.   Also switching over to solaris/zfs.
>
> Anyhow we have 24 drive capacity.  These are for large sequential access
> (large media files) used by no more than 3 or 5 users at a time.  I'm
> inquiring as to what the best configuration for this is for vdevs.  I'm
> considering the following configurations
>
> 4 x x6 vdevs in RaidZ1 configuration
> 3 x x8 vdevs in RaidZ2 configuration
>
> Obviously if a drive fails, it'll take a good several days to resilver.
>  The data is important but not critical.  Using raidz1 allows you one drive
> failure, but my understanding is that if the zpool has four vdevs using
> raidz1, then any single vdev failure of more than one drive may fail the
> entire zpool   If that is the case, then it sounds better to consider 3
> x8 with raidz2.
>
> Am I on the right track here ?  Thanks
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration suggestion with 24 drives

2010-01-28 Thread Lutz Schumann
Some very interesting insights on the availability calculations: 
  http://blogs.sun.com/relling/entry/raid_recommendations_space_vs_mttdl

For streaming also look at: 
   http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6732803

Regards, 
Robert
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration suggestion with 24 drives

2010-01-28 Thread Scott Meilicke
It looks like there is not a free slot for a hot spare? If that is the case, 
then it is one more factor to push towards raidz2, as you will need time to 
remove the failed disk and insert a new one. During that time you don't want to 
be left unprotected.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration suggestion with 24 drives

2010-01-28 Thread Freddie Cash
Personally, I'd go with 4x raidz2 vdevs, each with 6 drives.  You may not get 
as much raw storage space, but you can lose up to 2 drives per vdev, and you'll 
get more IOPS than with a 3x vdev setup.

Our current 24-drive storage servers use the 3x raidz2 vdevs with 8 drives in 
each.  Performance is good, but not great (tops out at 300 MBps using SATA 
drives and controllers).  This is using 2 12-port RAID controllers, so one of 
the vdevs is split across the controllers.

If I could rebuild things from scratch, I'd go with 4x 8-port SATA controllers, 
and use 4x 6-drive raidz2, using a separate controller for each vdev.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration suggestion with 24 drives

2010-01-28 Thread Edward Ned Harvey
> Replacing my current media server with another larger capacity media
> server.   Also switching over to solaris/zfs.
> 
> Anyhow we have 24 drive capacity.  These are for large sequential
> access (large media files) used by no more than 3 or 5 users at a time.

What type of disks are you using, and how fast is your network?  Will it be
mostly read operations, or a lot of write operations too?  Do you care about
making sure the filer can keep up with the speed of the network?

Typical 7200rpm sata disks can sustain approx 500Mbps, and therefore a
2-disk mirror can sustainably max out a Gb Ethernet.  A bunch of 2-disk
mirrors striped together would definitely be able to keep up.

People often mistakenly think that raidz or raidz2 perform well, like a
bunch of disks working as a team.  In my tests, a raid5 configuration
usually performs slower than a single disk, especially for writes.  (Note: I
said raid5, not raidz.  I haven't tested zfs to see if raidz can outperform
raid5 on an enterprise LSI raid controller fully accelerated.)

If you want performance, go with a bunch of mirrors striped together.  If
you want to keep your GB/$ maximized, go for raidz.

In either configuration, it is highly advisable to keep all disks
identically sized, and have a hotspare.

Also, if you get a single (doesn't need to be redundant) high performance
SSD (can be small ... 32G or whatnot) disk to use for the ZIL, you get a
performance boost that way too.  I emphasize high performance, because not
all cheap SSD's outperform real hard drives.  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration suggestion with 24 drives

2010-01-28 Thread Daniel Carosone
On Thu, Jan 28, 2010 at 07:26:42AM -0800, Ed Fang wrote:
> 4 x x6 vdevs in RaidZ1 configuration
> 3 x x8 vdevs in RaidZ2 configuration

Another choice might be
 2 x x12 vdevs in raidz2 configuration

This gets you the space of the first, with the recovery properties of
the second - at a cost in potential performance.  Your workload
(mostly streaming, not many parallel streams, large files) sounds like
it might be one that can tolerate this cost, but care will be needed.
Experiment and measure, if you can.

2 x x12 could also get you to raidz3, for extra safety, making the
same performance tradeoff against 3x8 with constant space.  I don't
think this is a choice you're likely to want, but worth mentioning.

> Obviously if a drive fails, it'll take a good several days to
> resilver.  The data is important but not critical.  

That's important information.

> Using raidz1
> allows you one drive failure, but my understanding is that if the
> zpool has four vdevs using raidz1, then any single vdev failure of
> more than one drive may fail the entire zpool 

Correct, as already discussed.

However, there are actually two questions here, and your
final decision depends on both:

 - how many vdevs of what type?
 - how many pools?

Do you need all the space available in one common pool, or can your 
application distribute space and load between multiple resource
containers?   You probably have more degrees of trade-off freedom,
even for the same choices of base vdevs.

If space is more important to you, and losing 1/4 of your non-critical
files on a second disk failure is a tolerable risk, you might consider
4 pools of 6-disk raidz1.  Likewise, 3 pools of 8-disk raidz2 reduces
the worst impact of a third disk failure to 1/3 of you data, and 2
pools of 12-disk vdevs to 1/2.

> If that is the
> case, then it sounds better to consider 3 x8 with raidz2.  

Others have recommended raidz2, and I agree with them, in general
principle. 

All that said, for large files that will fill large blocks, I'm wary
of raidz pools with an odd number of data disks, and prefer if
possible, a power-of-two number of data disks (plus whatever
redundancy level you choose).   Raid-z striping can leave holes, and
this seems like it may result in inefficencies, either in space,
fragmentation or just extra work.  I have not measured this, and it
may be irrelevant or invisible, generally or in your workload.

So, I would recommend raidz2 vdevs, either 3x8 or 2x12. Test and
compare the performance under your workload and see if you can afford
the cost of the extra space the wide stripes offer.  Test the
performance while scrubs and resilvers are going on as well as real
workload. If 2x12 can carry this for you, go for it. Then choose
whether to combine the vdevs into a big pool, or keep them separate.

--
Dan.


pgpTn58UmlK25.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration suggestion with 24 drives

2010-01-28 Thread Ed Fang
Thanks for the responses guys.  It looks like I'll probably use RaidZ2 with 8 
drives.  The write bandwidth isn't that great as it'll be a hundred gigs every 
couple weeks but in a bulk load type of environment.  So, not a major issue.  
Testing with 8 drives in a raidz2 easily saturated a GigE connection on the 
client and the server side.  We'll  probably link aggregate two GigE ports onto 
the switch to boost the incoming bandwidth.

In response to some of the other questions - drives are SATA drives 7200.  All 
connected via a SAS expander backplane onto a machine.  CPU cycles obviously 
aren't an issue on a Xeon machine/24Gig memory.  We considered a SSD ZIL as 
well but from my understanding it won't help much on sequential bulk writes but 
really helps on random writes (to sequence going to disk better).  Also, doubt 
L2ARC/ARC will help that much for sequential either.   I could be wrong on both 
counts here so please correct me if I'm wrong.  

Currently testing with 8 disk RaidZ2 and see how that performs.  As it isn't 
speed critical - this will probably be the sweet spot between storage and 
reliability for us.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration suggestion with 24 drives

2010-01-28 Thread Daniel Carosone
On Thu, Jan 28, 2010 at 09:33:19PM -0800, Ed Fang wrote:
> We considered a SSD ZIL as well but from my understanding it won't
> help much on sequential bulk writes but really helps on random
> writes (to sequence going to disk better).  

slog will only help if your write load involves lots of synchronous
writes; typically apps calling fsync() or using O_*SYNC, and writes
via NFS.  Random vs sequential isn't important (though sync random
writes can be worse for the combination).  Otherwise, it won't help.

zilstat.sh will help you figure out if it will.  If the workload
would be helped by slog at all, raidz might be helped the most, since
it's the most limited for total IOPS (vs mirror). 

> Also, doubt L2ARC/ARC will help that much for sequential either.

Maybe, maybe not.  It depends mostly on how often you re-stream the
same content, so the cache can be hit often enough to be worthwhile.
At the other end, with decent RAM and lots of repeated content, you
might not even see much benefit from l2arc if enough fits in l1arc :)

I didn't mention it when talking about performance, even if it might
reduce disk load with a good hit ratio, because l2arc (currently)
starts cold after each reboot.   If you need to stream N clits at rate
X, you probably need to do so from boot and can't wait for the cache
to warm up. 

Cache might help you keep doing so after a while, with less work, but
for a discussion of the underlying pool storage the base requirement
is the same. 

--
Dan.


pgpYG8yTKYOd5.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration suggestion with 24 drives

2010-01-29 Thread Edward Ned Harvey
> Thanks for the responses guys.  It looks like I'll probably use RaidZ2
> with 8 drives.  The write bandwidth isn't that great as it'll be a
> hundred gigs every couple weeks but in a bulk load type of environment.
> So, not a major issue.  Testing with 8 drives in a raidz2 easily
> saturated a GigE connection on the client and the server side.  We'll
> probably link aggregate two GigE ports onto the switch to boost the
> incoming bandwidth.
> 
> In response to some of the other questions - drives are SATA drives
> 7200.  All connected via a SAS expander backplane onto a machine.  CPU
> cycles obviously aren't an issue on a Xeon machine/24Gig memory.  We
> considered a SSD ZIL as well but from my understanding it won't help
> much on sequential bulk writes but really helps on random writes (to
> sequence going to disk better).  Also, doubt L2ARC/ARC will help that
> much for sequential either.   I could be wrong on both counts here so
> please correct me if I'm wrong.

I believe you're correct on all points.  

The one comment I want to add, as a tangent, is about link aggregation.  You
may already know this, but a lot of people don't, so please forgive me if
I'm saying something obvious.

When you aggregate links together, say, 4x 1Gb ports, you are of course
increasing the speed & reliability of the network interface, but you don't
get something like a 4Gb port.  Instead, you get a link where any one client
TCP or whatever connection will max out at 1Gb, but the advantage is, while
one client is maxing out at 1Gb, another client can come along and also max
out another 1Gb, and a 3rd client ... and a 4th client ... 

Make sense?  Obvious?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration suggestion with 24 drives

2010-01-29 Thread Thomas Burgess
On Fri, Jan 29, 2010 at 5:54 AM, Edward Ned Harvey wrote:

> > Thanks for the responses guys.  It looks like I'll probably use RaidZ2
> > with 8 drives.  The write bandwidth isn't that great as it'll be a
> > hundred gigs every couple weeks but in a bulk load type of environment.
> > So, not a major issue.  Testing with 8 drives in a raidz2 easily
> > saturated a GigE connection on the client and the server side.  We'll
> > probably link aggregate two GigE ports onto the switch to boost the
> > incoming bandwidth.
> >
> > In response to some of the other questions - drives are SATA drives
> > 7200.  All connected via a SAS expander backplane onto a machine.  CPU
> > cycles obviously aren't an issue on a Xeon machine/24Gig memory.  We
> > considered a SSD ZIL as well but from my understanding it won't help
> > much on sequential bulk writes but really helps on random writes (to
> > sequence going to disk better).  Also, doubt L2ARC/ARC will help that
> > much for sequential either.   I could be wrong on both counts here so
> > please correct me if I'm wrong.
>
> I believe you're correct on all points.
>
> The one comment I want to add, as a tangent, is about link aggregation.
>  You
> may already know this, but a lot of people don't, so please forgive me if
> I'm saying something obvious.
>
> When you aggregate links together, say, 4x 1Gb ports, you are of course
> increasing the speed & reliability of the network interface, but you don't
> get something like a 4Gb port.  Instead, you get a link where any one
> client
> TCP or whatever connection will max out at 1Gb, but the advantage is, while
> one client is maxing out at 1Gb, another client can come along and also max
> out another 1Gb, and a 3rd client ... and a 4th client ...
>
> Make sense?  Obvious?
>
>

Isn't that basically the same thing...i mean.

If you have 4x 1Gb as in your example, can you have 4 clients connected at
the same time all over Gb ethernet all getting close to 1Gb/s?

Isn't this LIKE having a 4Gb/s connection considering everything ELSE on
your network is essentially limited by thier small 1Gb/s connections?
Also, doesn't it also provide a level of fault tolerance as well as load
balancing?

___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration suggestion with 24 drives

2010-01-29 Thread Erik Trimble

Thomas Burgess wrote:



On Fri, Jan 29, 2010 at 5:54 AM, Edward Ned Harvey 
mailto:sola...@nedharvey.com>> wrote:


> Thanks for the responses guys.  It looks like I'll probably use
RaidZ2
> with 8 drives.  The write bandwidth isn't that great as it'll be a
> hundred gigs every couple weeks but in a bulk load type of
environment.
> So, not a major issue.  Testing with 8 drives in a raidz2 easily
> saturated a GigE connection on the client and the server side.
 We'll
> probably link aggregate two GigE ports onto the switch to boost the
> incoming bandwidth.
>
> In response to some of the other questions - drives are SATA drives
> 7200.  All connected via a SAS expander backplane onto a
machine.  CPU
> cycles obviously aren't an issue on a Xeon machine/24Gig memory.  We
> considered a SSD ZIL as well but from my understanding it won't help
> much on sequential bulk writes but really helps on random writes (to
> sequence going to disk better).  Also, doubt L2ARC/ARC will help
that
> much for sequential either.   I could be wrong on both counts
here so
> please correct me if I'm wrong.

I believe you're correct on all points.

The one comment I want to add, as a tangent, is about link
aggregation.  You
may already know this, but a lot of people don't, so please
forgive me if
I'm saying something obvious.

When you aggregate links together, say, 4x 1Gb ports, you are of
course
increasing the speed & reliability of the network interface, but
you don't
get something like a 4Gb port.  Instead, you get a link where any
one client
TCP or whatever connection will max out at 1Gb, but the advantage
is, while
one client is maxing out at 1Gb, another client can come along and
also max
out another 1Gb, and a 3rd client ... and a 4th client ...

Make sense?  Obvious?



Isn't that basically the same thing...i mean.

If you have 4x 1Gb as in your example, can you have 4 clients 
connected at the same time all over Gb ethernet all getting close to 
1Gb/s?


Isn't this LIKE having a 4Gb/s connection considering everything ELSE 
on your network is essentially limited by thier small 1Gb/s connections?
Also, doesn't it also provide a level of fault tolerance as well as 
load balancing?


I'm not 100% sure that all traffic between two hosts is now absolutely 
limited to the size of a single member link.  The standard requires all 
traffic for a single "conversation" to happen over a single link (to 
avoid ethernet packet reordering), but I /think/ modern implementations 
no longer group all traffic between two hosts over an aggregated link as 
a single "conversation". 

I'd have to check, but I think what that means nowdays is that any 
/single/ connection across an aggregated link maxes out at the speed of 
one of the component links, but that there is nothing preventing 
/multiple/ connections between two hosts from using different component 
links.  e.g. you could have an HTTP and FTP connection each use 
different links, even though both have the same two machines involved.


But, someone, please correct me on this if I'm wrong.


And, we're getting pretty far off topic here...

--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration suggestion with 24 drives

2010-01-29 Thread Scott Meilicke
Link aggregation can use different algorithms to load balance. Using L4 (IP 
plus originating port I think), using a single client computer and the same 
protocol (NFS), but different origination ports has allowed me to saturate both 
NICS in my LAG. So yes, you just need more than one 'conversation', but the LAG 
setup will determine how a conversation is defined.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration suggestion with 24 drives

2010-01-29 Thread Richard Elling
On Jan 29, 2010, at 9:12 AM, Scott Meilicke wrote:

> Link aggregation can use different algorithms to load balance. Using L4 (IP 
> plus originating port I think), using a single client computer and the same 
> protocol (NFS), but different origination ports has allowed me to saturate 
> both NICS in my LAG. So yes, you just need more than one 'conversation', but 
> the LAG setup will determine how a conversation is defined.

A more flexible solution for iSCSI is to use MP[x]IO on the client.
In my experience, most people who try link aggregation become unhappy
with it and move up the stack for better redundancy and efficiency.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss