Re: [zfs-discuss] ZFS configuration suggestion with 24 drives
On Jan 29, 2010, at 9:12 AM, Scott Meilicke wrote: > Link aggregation can use different algorithms to load balance. Using L4 (IP > plus originating port I think), using a single client computer and the same > protocol (NFS), but different origination ports has allowed me to saturate > both NICS in my LAG. So yes, you just need more than one 'conversation', but > the LAG setup will determine how a conversation is defined. A more flexible solution for iSCSI is to use MP[x]IO on the client. In my experience, most people who try link aggregation become unhappy with it and move up the stack for better redundancy and efficiency. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS configuration suggestion with 24 drives
Link aggregation can use different algorithms to load balance. Using L4 (IP plus originating port I think), using a single client computer and the same protocol (NFS), but different origination ports has allowed me to saturate both NICS in my LAG. So yes, you just need more than one 'conversation', but the LAG setup will determine how a conversation is defined. Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS configuration suggestion with 24 drives
Thomas Burgess wrote: On Fri, Jan 29, 2010 at 5:54 AM, Edward Ned Harvey mailto:sola...@nedharvey.com>> wrote: > Thanks for the responses guys. It looks like I'll probably use RaidZ2 > with 8 drives. The write bandwidth isn't that great as it'll be a > hundred gigs every couple weeks but in a bulk load type of environment. > So, not a major issue. Testing with 8 drives in a raidz2 easily > saturated a GigE connection on the client and the server side. We'll > probably link aggregate two GigE ports onto the switch to boost the > incoming bandwidth. > > In response to some of the other questions - drives are SATA drives > 7200. All connected via a SAS expander backplane onto a machine. CPU > cycles obviously aren't an issue on a Xeon machine/24Gig memory. We > considered a SSD ZIL as well but from my understanding it won't help > much on sequential bulk writes but really helps on random writes (to > sequence going to disk better). Also, doubt L2ARC/ARC will help that > much for sequential either. I could be wrong on both counts here so > please correct me if I'm wrong. I believe you're correct on all points. The one comment I want to add, as a tangent, is about link aggregation. You may already know this, but a lot of people don't, so please forgive me if I'm saying something obvious. When you aggregate links together, say, 4x 1Gb ports, you are of course increasing the speed & reliability of the network interface, but you don't get something like a 4Gb port. Instead, you get a link where any one client TCP or whatever connection will max out at 1Gb, but the advantage is, while one client is maxing out at 1Gb, another client can come along and also max out another 1Gb, and a 3rd client ... and a 4th client ... Make sense? Obvious? Isn't that basically the same thing...i mean. If you have 4x 1Gb as in your example, can you have 4 clients connected at the same time all over Gb ethernet all getting close to 1Gb/s? Isn't this LIKE having a 4Gb/s connection considering everything ELSE on your network is essentially limited by thier small 1Gb/s connections? Also, doesn't it also provide a level of fault tolerance as well as load balancing? I'm not 100% sure that all traffic between two hosts is now absolutely limited to the size of a single member link. The standard requires all traffic for a single "conversation" to happen over a single link (to avoid ethernet packet reordering), but I /think/ modern implementations no longer group all traffic between two hosts over an aggregated link as a single "conversation". I'd have to check, but I think what that means nowdays is that any /single/ connection across an aggregated link maxes out at the speed of one of the component links, but that there is nothing preventing /multiple/ connections between two hosts from using different component links. e.g. you could have an HTTP and FTP connection each use different links, even though both have the same two machines involved. But, someone, please correct me on this if I'm wrong. And, we're getting pretty far off topic here... -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS configuration suggestion with 24 drives
On Fri, Jan 29, 2010 at 5:54 AM, Edward Ned Harvey wrote: > > Thanks for the responses guys. It looks like I'll probably use RaidZ2 > > with 8 drives. The write bandwidth isn't that great as it'll be a > > hundred gigs every couple weeks but in a bulk load type of environment. > > So, not a major issue. Testing with 8 drives in a raidz2 easily > > saturated a GigE connection on the client and the server side. We'll > > probably link aggregate two GigE ports onto the switch to boost the > > incoming bandwidth. > > > > In response to some of the other questions - drives are SATA drives > > 7200. All connected via a SAS expander backplane onto a machine. CPU > > cycles obviously aren't an issue on a Xeon machine/24Gig memory. We > > considered a SSD ZIL as well but from my understanding it won't help > > much on sequential bulk writes but really helps on random writes (to > > sequence going to disk better). Also, doubt L2ARC/ARC will help that > > much for sequential either. I could be wrong on both counts here so > > please correct me if I'm wrong. > > I believe you're correct on all points. > > The one comment I want to add, as a tangent, is about link aggregation. > You > may already know this, but a lot of people don't, so please forgive me if > I'm saying something obvious. > > When you aggregate links together, say, 4x 1Gb ports, you are of course > increasing the speed & reliability of the network interface, but you don't > get something like a 4Gb port. Instead, you get a link where any one > client > TCP or whatever connection will max out at 1Gb, but the advantage is, while > one client is maxing out at 1Gb, another client can come along and also max > out another 1Gb, and a 3rd client ... and a 4th client ... > > Make sense? Obvious? > > Isn't that basically the same thing...i mean. If you have 4x 1Gb as in your example, can you have 4 clients connected at the same time all over Gb ethernet all getting close to 1Gb/s? Isn't this LIKE having a 4Gb/s connection considering everything ELSE on your network is essentially limited by thier small 1Gb/s connections? Also, doesn't it also provide a level of fault tolerance as well as load balancing? ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS configuration suggestion with 24 drives
> Thanks for the responses guys. It looks like I'll probably use RaidZ2 > with 8 drives. The write bandwidth isn't that great as it'll be a > hundred gigs every couple weeks but in a bulk load type of environment. > So, not a major issue. Testing with 8 drives in a raidz2 easily > saturated a GigE connection on the client and the server side. We'll > probably link aggregate two GigE ports onto the switch to boost the > incoming bandwidth. > > In response to some of the other questions - drives are SATA drives > 7200. All connected via a SAS expander backplane onto a machine. CPU > cycles obviously aren't an issue on a Xeon machine/24Gig memory. We > considered a SSD ZIL as well but from my understanding it won't help > much on sequential bulk writes but really helps on random writes (to > sequence going to disk better). Also, doubt L2ARC/ARC will help that > much for sequential either. I could be wrong on both counts here so > please correct me if I'm wrong. I believe you're correct on all points. The one comment I want to add, as a tangent, is about link aggregation. You may already know this, but a lot of people don't, so please forgive me if I'm saying something obvious. When you aggregate links together, say, 4x 1Gb ports, you are of course increasing the speed & reliability of the network interface, but you don't get something like a 4Gb port. Instead, you get a link where any one client TCP or whatever connection will max out at 1Gb, but the advantage is, while one client is maxing out at 1Gb, another client can come along and also max out another 1Gb, and a 3rd client ... and a 4th client ... Make sense? Obvious? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS configuration suggestion with 24 drives
On Thu, Jan 28, 2010 at 09:33:19PM -0800, Ed Fang wrote: > We considered a SSD ZIL as well but from my understanding it won't > help much on sequential bulk writes but really helps on random > writes (to sequence going to disk better). slog will only help if your write load involves lots of synchronous writes; typically apps calling fsync() or using O_*SYNC, and writes via NFS. Random vs sequential isn't important (though sync random writes can be worse for the combination). Otherwise, it won't help. zilstat.sh will help you figure out if it will. If the workload would be helped by slog at all, raidz might be helped the most, since it's the most limited for total IOPS (vs mirror). > Also, doubt L2ARC/ARC will help that much for sequential either. Maybe, maybe not. It depends mostly on how often you re-stream the same content, so the cache can be hit often enough to be worthwhile. At the other end, with decent RAM and lots of repeated content, you might not even see much benefit from l2arc if enough fits in l1arc :) I didn't mention it when talking about performance, even if it might reduce disk load with a good hit ratio, because l2arc (currently) starts cold after each reboot. If you need to stream N clits at rate X, you probably need to do so from boot and can't wait for the cache to warm up. Cache might help you keep doing so after a while, with less work, but for a discussion of the underlying pool storage the base requirement is the same. -- Dan. pgpYG8yTKYOd5.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS configuration suggestion with 24 drives
Thanks for the responses guys. It looks like I'll probably use RaidZ2 with 8 drives. The write bandwidth isn't that great as it'll be a hundred gigs every couple weeks but in a bulk load type of environment. So, not a major issue. Testing with 8 drives in a raidz2 easily saturated a GigE connection on the client and the server side. We'll probably link aggregate two GigE ports onto the switch to boost the incoming bandwidth. In response to some of the other questions - drives are SATA drives 7200. All connected via a SAS expander backplane onto a machine. CPU cycles obviously aren't an issue on a Xeon machine/24Gig memory. We considered a SSD ZIL as well but from my understanding it won't help much on sequential bulk writes but really helps on random writes (to sequence going to disk better). Also, doubt L2ARC/ARC will help that much for sequential either. I could be wrong on both counts here so please correct me if I'm wrong. Currently testing with 8 disk RaidZ2 and see how that performs. As it isn't speed critical - this will probably be the sweet spot between storage and reliability for us. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS configuration suggestion with 24 drives
On Thu, Jan 28, 2010 at 07:26:42AM -0800, Ed Fang wrote: > 4 x x6 vdevs in RaidZ1 configuration > 3 x x8 vdevs in RaidZ2 configuration Another choice might be 2 x x12 vdevs in raidz2 configuration This gets you the space of the first, with the recovery properties of the second - at a cost in potential performance. Your workload (mostly streaming, not many parallel streams, large files) sounds like it might be one that can tolerate this cost, but care will be needed. Experiment and measure, if you can. 2 x x12 could also get you to raidz3, for extra safety, making the same performance tradeoff against 3x8 with constant space. I don't think this is a choice you're likely to want, but worth mentioning. > Obviously if a drive fails, it'll take a good several days to > resilver. The data is important but not critical. That's important information. > Using raidz1 > allows you one drive failure, but my understanding is that if the > zpool has four vdevs using raidz1, then any single vdev failure of > more than one drive may fail the entire zpool Correct, as already discussed. However, there are actually two questions here, and your final decision depends on both: - how many vdevs of what type? - how many pools? Do you need all the space available in one common pool, or can your application distribute space and load between multiple resource containers? You probably have more degrees of trade-off freedom, even for the same choices of base vdevs. If space is more important to you, and losing 1/4 of your non-critical files on a second disk failure is a tolerable risk, you might consider 4 pools of 6-disk raidz1. Likewise, 3 pools of 8-disk raidz2 reduces the worst impact of a third disk failure to 1/3 of you data, and 2 pools of 12-disk vdevs to 1/2. > If that is the > case, then it sounds better to consider 3 x8 with raidz2. Others have recommended raidz2, and I agree with them, in general principle. All that said, for large files that will fill large blocks, I'm wary of raidz pools with an odd number of data disks, and prefer if possible, a power-of-two number of data disks (plus whatever redundancy level you choose). Raid-z striping can leave holes, and this seems like it may result in inefficencies, either in space, fragmentation or just extra work. I have not measured this, and it may be irrelevant or invisible, generally or in your workload. So, I would recommend raidz2 vdevs, either 3x8 or 2x12. Test and compare the performance under your workload and see if you can afford the cost of the extra space the wide stripes offer. Test the performance while scrubs and resilvers are going on as well as real workload. If 2x12 can carry this for you, go for it. Then choose whether to combine the vdevs into a big pool, or keep them separate. -- Dan. pgpTn58UmlK25.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS configuration suggestion with 24 drives
> Replacing my current media server with another larger capacity media > server. Also switching over to solaris/zfs. > > Anyhow we have 24 drive capacity. These are for large sequential > access (large media files) used by no more than 3 or 5 users at a time. What type of disks are you using, and how fast is your network? Will it be mostly read operations, or a lot of write operations too? Do you care about making sure the filer can keep up with the speed of the network? Typical 7200rpm sata disks can sustain approx 500Mbps, and therefore a 2-disk mirror can sustainably max out a Gb Ethernet. A bunch of 2-disk mirrors striped together would definitely be able to keep up. People often mistakenly think that raidz or raidz2 perform well, like a bunch of disks working as a team. In my tests, a raid5 configuration usually performs slower than a single disk, especially for writes. (Note: I said raid5, not raidz. I haven't tested zfs to see if raidz can outperform raid5 on an enterprise LSI raid controller fully accelerated.) If you want performance, go with a bunch of mirrors striped together. If you want to keep your GB/$ maximized, go for raidz. In either configuration, it is highly advisable to keep all disks identically sized, and have a hotspare. Also, if you get a single (doesn't need to be redundant) high performance SSD (can be small ... 32G or whatnot) disk to use for the ZIL, you get a performance boost that way too. I emphasize high performance, because not all cheap SSD's outperform real hard drives. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS configuration suggestion with 24 drives
Personally, I'd go with 4x raidz2 vdevs, each with 6 drives. You may not get as much raw storage space, but you can lose up to 2 drives per vdev, and you'll get more IOPS than with a 3x vdev setup. Our current 24-drive storage servers use the 3x raidz2 vdevs with 8 drives in each. Performance is good, but not great (tops out at 300 MBps using SATA drives and controllers). This is using 2 12-port RAID controllers, so one of the vdevs is split across the controllers. If I could rebuild things from scratch, I'd go with 4x 8-port SATA controllers, and use 4x 6-drive raidz2, using a separate controller for each vdev. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS configuration suggestion with 24 drives
It looks like there is not a free slot for a hot spare? If that is the case, then it is one more factor to push towards raidz2, as you will need time to remove the failed disk and insert a new one. During that time you don't want to be left unprotected. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS configuration suggestion with 24 drives
Some very interesting insights on the availability calculations: http://blogs.sun.com/relling/entry/raid_recommendations_space_vs_mttdl For streaming also look at: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6732803 Regards, Robert -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS configuration suggestion with 24 drives
if a vdev fails you loose the pool. if you go with raidz1 and 2 of the RIGHT drives fail (2 in the same vdev) your pool is lost. I was faced with a similar situation recently and decided that raidz2 was the better option. It's comes down to resilver timesif you look at how long it will take to replace a failed drive then look at the likelyhood of a drive failing durring that process then raidz1 is much less attractive. On Thu, Jan 28, 2010 at 10:26 AM, Ed Fang wrote: > Replacing my current media server with another larger capacity media > server. Also switching over to solaris/zfs. > > Anyhow we have 24 drive capacity. These are for large sequential access > (large media files) used by no more than 3 or 5 users at a time. I'm > inquiring as to what the best configuration for this is for vdevs. I'm > considering the following configurations > > 4 x x6 vdevs in RaidZ1 configuration > 3 x x8 vdevs in RaidZ2 configuration > > Obviously if a drive fails, it'll take a good several days to resilver. > The data is important but not critical. Using raidz1 allows you one drive > failure, but my understanding is that if the zpool has four vdevs using > raidz1, then any single vdev failure of more than one drive may fail the > entire zpool If that is the case, then it sounds better to consider 3 > x8 with raidz2. > > Am I on the right track here ? Thanks > -- > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS configuration suggestion with 24 drives
Replacing my current media server with another larger capacity media server. Also switching over to solaris/zfs. Anyhow we have 24 drive capacity. These are for large sequential access (large media files) used by no more than 3 or 5 users at a time. I'm inquiring as to what the best configuration for this is for vdevs. I'm considering the following configurations 4 x x6 vdevs in RaidZ1 configuration 3 x x8 vdevs in RaidZ2 configuration Obviously if a drive fails, it'll take a good several days to resilver. The data is important but not critical. Using raidz1 allows you one drive failure, but my understanding is that if the zpool has four vdevs using raidz1, then any single vdev failure of more than one drive may fail the entire zpool If that is the case, then it sounds better to consider 3 x8 with raidz2. Am I on the right track here ? Thanks -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss