Hi, Thanks for the quick reply. Now that you have mentioned , we have a different issue. What is the advantage of using spare disks instead of including them in the raid-z array? If the system pool is on mirrored disks, I think that this would be enough (hopefully). When one disk fails, isn't it better to have a spare disk on hold, instead of one more disk in the raid-z and no spares(or just a few)? or, rephrased, is it safer and faster to replace a disk in a raid-z3 and restore the data from the other disks, or to have a raid-z2 with a spare disk?
Thank you, On Mon, Nov 29, 2010 at 6:03 AM, Erik Trimble <erik.trim...@oracle.com>wrote: > On 11/28/2010 1:51 PM, Paul Piscuc wrote: > >> Hi, >> >> We are a company that want to replace our current storage layout with one >> that uses ZFS. We have been testing it for a month now, and everything looks >> promising. One element that we cannot determine is the optimum number of >> disks in a raid-z pool. In the ZFS best practice guide, 7,9 and 11 disks are >> recommended to be used in a single raid-z2. On the other hand, another user >> specifies that the most important part is the distribution of the defaul >> 128k record size to all the disks. So, the recommended layout would be: >> >> 4-disk RAID-Z2 = 128KiB / 2 = 64KiB = good >> 5-disk RAID-Z2 = 128KiB / 3 = ~43KiB = not good >> 6-disk RAID-Z2 = 128KiB / 4 = 32KiB = good >> 10-disk RAID-Z2 = 128KiB / 8 = 16KiB = good >> >> What is your recommendations regarding the number of disks? We are >> planning to use 2 raid-z2 pools with 8+2 disks, 2 spare, 2 SSDs for L2ARC, 2 >> SSDs for ZIL, 2 for syspool, and a similar machine for replication. >> >> Thanks in advance, >> >> > You've hit on one of the hardest parts of using ZFS - optimization. Truth > of the matter is that there is NO one-size-fits-all "best" solution. It > heavily depends on your workload type - access patterns, write patterns, > type of I/O, and size of average I/O request. > > A couple of things here: > > (1) Unless you are using Zvols for "raw" disk partitions (for use with > something like a database), the recordsize value is a MAXIMUM value, NOT an > absolute value. Thus, if you have a ZFS filesystem with a record size of > 128k, it will break up I/O into 128k chunks for writing, but it will also > write smaller chunks. I forget what the minimum size is (512b or 1k, IIRC), > but what ZFS does is use a Variable block size, up to the maximum size > specified in the "recordsize" property. So, if recordsize=128k and you > have a 190k write I/O op, it will write a 128k chunk, and a 64k chunk (64 > being the smallest multiple of 2 greater than the remaining 62 bits of > info). It WON'T write two 128k chunks. > > (2) #1 comes up a bit when you have a mix of file sizes - for instance, > home directories, where you have lots of small files (initialization files, > source code, etc.) combined with some much larger files (images, mp3s, > executable binaries, etc.). Thus, such a filesystem will have a wide > variety of chunk sizes, which makes optimization difficult, to say the > least. > > (3) For *random* I/O, a raidZ of any number of disks performs roughly like > a *single* disk in terms of IOPs and a little better than a single disk in > terms of throughput. So, if you have considerable amounts of random I/O, > you should really either use small raidz configs (no more than 4 data > disks), or switch to mirrors instead. > > (4) For *sequential* or large-size I/O, a raidZ performs roughly equivalent > to a stripe of the same number of data disks. That is, a N-disk raidz2 will > perform about the same as a (N-2) disk stripe in terms of throughput and > IOPS. > > (5) As I mentioned in #1, *all* ZFS I/O is broken up into > powers-of-two-sized chunks, even if the last chunk must have some padding in > it to get to a power-of-two. This has implications as to the best number > of disks in a raidZ(n). > > > I'd have to re-look at the ZFS Best Practices Guide, but I'm pretty sure > the recommendation of 7, 9, or 11 disks was for a raidz1, NOT a raidz2. Due > to #5 above, best performance comes with an EVEN number of data disks in any > raidZ, so a write to any disks is always a full portion of the chunk, rather > than a partial one (that sounds funny, but trust me). The best balance of > size, IOPs, and throughput is found in the mid-size raidZ(n) configs, where > there are 4, 6 or 8 data disks. > > > Honestly, even with you describing a workload, it will be hard for us to > give you a real exact answer. My best suggestion is to do some testing with > raidZ(n) of different sizes, to see the tradeoffs between size and > performance. > > > Also, in your sample config, unless you plan to use the spare disks for > redundancy on the boot mirror, it would be better to configure 2 x 11-disk > raidZ3 than 2 x 10-disk raidZ2 + 2 spares. Better reliability. > > > -- > Erik Trimble > Java System Support > Mailstop: usca22-123 > Phone: x17195 > Santa Clara, CA > >
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss