Re: [zfs-discuss] Hardware RAID vs. ZFS RAID
With my (COTS) LSI 1068 and 1078 based controllers I get consistently better performance when I export all disks as jbod (MegaCli - CfgEachDskRaid0). Is that really 'all disks as JBOD'? or is it 'each disk as a single drive RAID0'? single disk raid0: ./MegaCli -CfgEachDskRaid0 Direct -a0 It may not sound different on the surface, but I asked in another thread and others confirmed, that if your RAID card has a battery backed cache giving ZFS many single drive RAID0's is much better than JBOD (using the 'nocacheflush' option may even improve it more.) My understanding is that it's kind of like the best of both worlds. You get the higher number of spindles and vdevs for ZFS to manage, ZFS gets to do the redundancy, and the the HW RAID Cache gives virtually instant acknowledgement of writes, so that ZFS can be on it's way. So I think many RAID0's is not always the same as JBOD. That's not to say that even True JBOD doesn't still have an advantage over HW RAID. I don't know that for sure. I have tried mixing hardware and zfs raid but it just doesn't make sense to use from a performance or redundancy standpoint why we would add those layers of complexity. In this case I'm building nearline so there isn't even a battery attached and I have disabled any caching on the controller. I have a SUN SAS HBA on the way which would be what I would use ultimately for my JBOD attachment. But I think there is a use for HW RAID in ZFS configs which wasn't always the theory I've heard. I have really learned not to do it this way with raidz and raidz2: #zpool create pool2 raidz c3t8d0 c3t9d0 c3t10d0 c3t11d0 c3t12d0 c3t13d0 c3t14d0 c3t15d0 Why? I know creating raidz's with more than 9-12 devices, but that doesn't cross that threshold. Is there a reason you'd split 8 disks up into 2 groups of 4? What experience led you to this? (Just so I don't have to repeat it. ;) ) I don't know why but with most setups I have tested (8 and 16 drive configs) dividing raid5 into 4 disks per vdev and 5 for a raidz2 perform better. Take a look at my simple dd test (filebench results as soon as I can figure out how to get it working proper with SOL10). = 8 SATA 500gb disk system with LSI 1068 (megaRAID ELP) - no BBU - bash-3.00# zpool history History for 'pool0-raidz': 2008-02-11.16:38:13 zpool create pool0-raidz raidz c2t0d0 c2t1d0 c2t2d0 c2t3d0 c2t4d0 c2t5d0 c2t6d0 c2t7d0 bash-3.00# zfs list NAME USED AVAIL REFER MOUNTPOINT pool0-raidz 117K 3.10T 42.6K /pool0-raidz bash-3.00# time dd if=/dev/zero of=/pool0-raidz/w-test.lo0 bs=8192 count=131072;time sync 131072+0 records in 131072+0 records out real0m1.768s user0m0.080s sys 0m1.688s real0m3.495s user0m0.001s sys 0m0.013s bash-3.00# time dd if=/pool0-raidz/w-test.lo0 of=/pool0-raidz/rw-test.lo0 bs=8192; time sync 131072+0 records in 131072+0 records out real0m6.994s user0m0.097s sys 0m2.827s real0m1.043s user0m0.001s sys 0m0.013s bash-3.00# time dd if=/dev/zero of=/pool0-raidz/w-test.lo1 bs=8192 count=655360;time sync 655360+0 records in 655360+0 records out real0m24.064s user0m0.402s sys 0m8.974s real0m1.629s user0m0.001s sys 0m0.013s bash-3.00# time dd if=/pool0-raidz/w-test.lo1 of=/pool0-raidz/rw-test.lo1 bs=8192; time sync 655360+0 records in 655360+0 records out real0m40.542s user0m0.476s sys 0m16.077s real0m0.617s user0m0.001s sys 0m0.013s bash-3.00# time dd if=/pool0-raidz/w-test.lo0 of=/dev/null bs=8192; time sync 131072+0 records in 131072+0 records out real0m3.443s user0m0.084s sys 0m1.327s real0m0.013s user0m0.001s sys 0m0.013s bash-3.00# time dd if=/pool0-raidz/w-test.lo1 of=/dev/null bs=8192; time sync 655360+0 records in 655360+0 records out real0m15.972s user0m0.413s sys 0m6.589s real0m0.013s user0m0.001s sys 0m0.012s --- bash-3.00# zpool history History for 'pool0-raidz': 2008-02-11.17:02:16 zpool create pool0-raidz raidz c2t0d0 c2t1d0 c2t2d0 c2t3d0 2008-02-11.17:02:51 zpool add pool0-raidz raidz c2t4d0 c2t5d0 c2t6d0 c2t7d0 bash-3.00# zfs list NAME USED AVAIL REFER MOUNTPOINT pool0-raidz 110K 2.67T 36.7K /pool0-raidz bash-3.00# time dd if=/dev/zero of=/pool0-raidz/w-test.lo0 bs=8192 count=131072;time sync 131072+0 records in 131072+0 records out real0m1.835s user0m0.079s sys 0m1.687s real0m2.521s user0m0.001s sys 0m0.013s bash-3.00# time dd if=/pool0-raidz/w-test.lo0 of=/pool0-raidz/rw-test.lo0 bs=8192; time sync 131072+0 records in 131072+0 records out real0m2.376s user0m0.084s sys 0m2.291s real0m2.578s user0m0.001s sys 0m0.013s bash-3.00# time dd if=/dev/zero of=/pool0-raidz/w-test.lo1 bs=8192 count=655360;time sync 655360+0 records in 655360+0 records out real0m19.531s user0m0.404s sys 0m8.731s real0m2.255s user0m0.001s sys
Re: [zfs-discuss] Hardware RAID vs. ZFS RAID
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 John-Paul Drawneek wrote: | I guess a USB pendrive would be slower than a | harddisk. Bad performance | for the ZIL. A decent pendrive of mine writes at 3-5MB/s. Sure there are faster ones, but any desktop harddisk can write at 50MB/s. If you are *not* talking about consumer grade pendrives, I can't comment. - -- Jesus Cea Avion _/_/ _/_/_/_/_/_/ [EMAIL PROTECTED] http://www.argo.es/~jcea/ _/_/_/_/ _/_/_/_/ _/_/ jabber / xmpp:[EMAIL PROTECTED] _/_/_/_/ _/_/_/_/_/ ~ _/_/ _/_/_/_/ _/_/ _/_/ Things are not so easy _/_/ _/_/_/_/ _/_/_/_/ _/_/ My name is Dump, Core Dump _/_/_/_/_/_/ _/_/ _/_/ El amor es poner tu felicidad en la felicidad de otro - Leibniz -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBR6sQSplgi5GaxT1NAQKD+AP/XdzxquaUk559ldZr2Wwcq0mIGnAXXDsf uCz+HBiYVLpgqqyv6I5gGgoeF417YopPvsiL0fpAEWIMeB/BgeTvU/xarq2sFeD6 NOt9S31C2pOaRCfDkPerBwof5ScKvqL4LICPUhWfYbrx45V6A6dV6IVYYzx1Pj6r ePKcyjPfDhQ= =n2Ut -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hardware RAID vs. ZFS RAID
With my (COTS) LSI 1068 and 1078 based controllers I get consistently better performance when I export all disks as jbod (MegaCli - CfgEachDskRaid0). I even went through all the loops and hoops with 6120's, 6130's and even some SGI storage and the result was always the same; better performance exporting single disk than even the ZFS profiles within CAM. --- 'pool0': #zpool create pool0 mirror c2t0d0 c2t1d0 #zpool add pool0 mirror c2t2d0 c2t3d0 #zpool add pool0 mirror c2t4d0 c2t5d0 #zpool add pool0 mirror c2t6d0 c2t7d0 'pool2': #zpool create pool2 raidz c3t8d0 c3t9d0 c3t10d0 c3t11d0 #zpool add pool2 raidz c3t12d0 c3t13d0 c3t14d0 c3t15d0 I have really learned not to do it this way with raidz and raidz2: #zpool create pool2 raidz c3t8d0 c3t9d0 c3t10d0 c3t11d0 c3t12d0 c3t13d0 c3t14d0 c3t15d0 So when is thumper going to have an all SAS option? :) -Andy On Feb 7, 2008, at 2:28 PM, Joel Miller wrote: Much of the complexity in hardware RAID is in the fault detection, isolation, and management. The fun part is trying to architect a fault-tolerant system when the suppliers of the components can not come close to enumerating most of the possible failure modes. What happens when a drive's performance slows down because it is having to go through internal retries more than others? What layer gets to declare a drive dead? What happens when you start declaring the drives dead one by one because of they all seemed to stop responding but the problem is not really the drives? Hardware RAID systems attempt to deal with problems that are not always straight forward...Hopefully we will eventually get similar functionality in Solaris... Understand that I am a proponent of ZFS, but everything has it's use. -Joel This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hardware RAID vs. ZFS RAID
Much of the complexity in hardware RAID is in the fault detection, isolation, and management. The fun part is trying to architect a fault-tolerant system when the suppliers of the components can not come close to enumerating most of the possible failure modes. What happens when a drive's performance slows down because it is having to go through internal retries more than others? What layer gets to declare a drive dead? What happens when you start declaring the drives dead one by one because of they all seemed to stop responding but the problem is not really the drives? Hardware RAID systems attempt to deal with problems that are not always straight forward...Hopefully we will eventually get similar functionality in Solaris... Understand that I am a proponent of ZFS, but everything has it's use. -Joel This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hardware RAID vs. ZFS RAID
Andy Lubel wrote: With my (COTS) LSI 1068 and 1078 based controllers I get consistently better performance when I export all disks as jbod (MegaCli - CfgEachDskRaid0). Is that really 'all disks as JBOD'? or is it 'each disk as a single drive RAID0'? It may not sound different on the surface, but I asked in another thread and others confirmed, that if your RAID card has a battery backed cache giving ZFS many single drive RAID0's is much better than JBOD (using the 'nocacheflush' option may even improve it more.) My understanding is that it's kind of like the best of both worlds. You get the higher number of spindles and vdevs for ZFS to manage, ZFS gets to do the redundancy, and the the HW RAID Cache gives virtually instant acknowledgement of writes, so that ZFS can be on it's way. So I think many RAID0's is not always the same as JBOD. That's not to say that even True JBOD doesn't still have an advantage over HW RAID. I don't know that for sure. But I think there is a use for HW RAID in ZFS configs which wasn't always the theory I've heard. I have really learned not to do it this way with raidz and raidz2: #zpool create pool2 raidz c3t8d0 c3t9d0 c3t10d0 c3t11d0 c3t12d0 c3t13d0 c3t14d0 c3t15d0 Why? I know creating raidz's with more than 9-12 devices, but that doesn't cross that threshold. Is there a reason you'd split 8 disks up into 2 groups of 4? What experience led you to this? (Just so I don't have to repeat it. ;) ) -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hardware RAID vs. ZFS RAID
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Vincent Fox wrote: | So the point is, a JBOD with a flash drive in one (or two to mirror the ZIL) of the slots would be a lot SIMPLER. I guess a USB pendrive would be slower than a harddisk. Bad performance for the ZIL. Does any one have any data on this? - -- Jesus Cea Avion _/_/ _/_/_/_/_/_/ argo.es http://www.argo.es/~jcea/ _/_/_/_/ _/_/ _/_/ _/_/ ber / xmpp:[EMAIL PROTECTED] _/_/_/_/ _/_/_/_/_/ _/_/ _/_/_/_/ _/_/ _/_/ re not so easy _/_/ _/_/_/_/ _/_/_/_/ _/_/ My name is Dump, Core Dump _/_/_/_/_/_/ _/_/ _/_/ mor es poner tu felicidad en la felicidad de otro - Leibniz -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBR6ImAplgi5GaxT1NAQJy6QQAm865PjzCGcJb70HMgrwDDO VHz3+kLvwA JlLA2icsMp+FdbuSO1xYU2AYejxFYTxzjrwLyi/vqbaDMM+HZzkOPR k8TXsgBPB+ 2aHQArFfS3ih3ZYakW0A0x5h35vykeu/Cl9aRjOrCSERkVsqjkXnQS ceGKSdgz5J mMPWKBUWnyI= =UoBx -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discu ss This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hardware RAID vs. ZFS RAID
John-Paul Drawneek wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Vincent Fox wrote: | So the point is, a JBOD with a flash drive in one (or two to mirror the ZIL) of the slots would be a lot SIMPLER. I guess a USB pendrive would be slower than a harddisk. Bad performance for the ZIL. Does any one have any data on this? +1 Inquiring minds want to know. :) -Kyle ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hardware RAID vs. ZFS RAID
Gregory Perry wrote: Hello, I have a Dell 2950 with a Perc 5/i, two 300GB 15K SAS drives in a RAID0 array. I am considering going to ZFS and I would like to get some feedback about which situation would yield the highest performance: using the Perc 5/i to provide a hardware RAID0 that is presented as a single volume to OpenSolaris, or using the drives separately and creating the RAID0 with OpenSolaris and ZFS? Or maybe just adding the hardware RAID0 to a ZFS pool? Can anyone suggest some articles or FAQs on implementing ZFS RAID? Which situation would provide the highest read and write throughput? I'm not sure which will perform the best. But giving ZFS the job of doing your redundancy (which with raid0 sounds like you're not planning to do.) would be better than having the HW do it (if you have enough equipment having both do it is ok.) That said, I had recent discussions on this list about an IBM HW RAID controller with battery backed cache, and the net result of the discussion seemed to be that with the cache, making each drive into a single drive RAID0 Lun on the HW raid (to gain the chance to use the write cache) and then letting ZFS combine the disks in what ever RAID maaner you want, would give performance benefits, especially if serving the filesystems over NFS is your goal. -Kyle Thanks in advance This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hardware RAID vs. ZFS RAID
I package up 5 or 6 disks into a RAID-5 LUN on our Sun 3510 and 2540 arrays. Then I use ZFS to RAID-10 these volumes. Safety first! Quite frankly I've had ENOUGH of rebuilding trashed filesystems. I am tired to chasing performance like it's the Holy Grail and shoving other considerations aside. A ZFS mirror pair can know when there's a bad block and get it from the other one, which is something you will not get with HW RAID. When Sun starts selling good SAS JBOD boxes equipped with appropriate redundancies and a flash-drive or 2 for the ZIL I will definitely go that route. For now I have a bunch of existing Sun HW RAID arrays so I make use of them mainly to make sure I can package LUNs and that assigned hot-spares are used to rebuild the LUNs when needed. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hardware RAID vs. ZFS RAID
Vincent Fox wrote: When Sun starts selling good SAS JBOD boxes equipped with appropriate redundancies and a flash-drive or 2 for the ZIL I will definitely go that route. For now I have a bunch of existing Sun HW RAID arrays so I make use of them mainly to make sure I can package LUNs and that assigned hot-spares are used to rebuild the LUNs when needed. Why not a RAID box with enough battery backed RAM, and that supports enough LUNs that you can make 1 RAID0 LUN for every drive? and gain the benefits of the battery backed write cache on every write, not just the ZIL writes? Is the benefit of a fast ZIL enough? Is it so close that the rest of what I describe is a waste? Granted JBOD plus Flash ZIL might be cheaper. Not having been able to do any testing of my own yet, I'm still struggling to understand the likely performance differences in the 2 approaches (or the third: the RAID0 idea above plus the Flash ZIL.) -Kyle This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hardware RAID vs. ZFS RAID
So the point is, a JBOD with a flash drive in one (or two to mirror the ZIL) of the slots would be a lot SIMPLER. We've all spent the last decade or two offloading functions into specialized hardware, that has turned into these massive unneccessarily complex things. I don't want to go to a new training class everytime we buy a new model of storage unit. I don't want to have to setup a new server on my private network to run the Java GUI management software for that array and all the other BS that array vendors put us through. I just want storage. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hardware RAID vs. ZFS RAID
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Vincent Fox wrote: | So the point is, a JBOD with a flash drive in one (or two to mirror the ZIL) of the slots would be a lot SIMPLER. I guess a USB pendrive would be slower than a harddisk. Bad performance for the ZIL. - -- Jesus Cea Avion _/_/ _/_/_/_/_/_/ [EMAIL PROTECTED] http://www.argo.es/~jcea/ _/_/_/_/ _/_/_/_/ _/_/ jabber / xmpp:[EMAIL PROTECTED] _/_/_/_/ _/_/_/_/_/ ~ _/_/ _/_/_/_/ _/_/ _/_/ Things are not so easy _/_/ _/_/_/_/ _/_/_/_/ _/_/ My name is Dump, Core Dump _/_/_/_/_/_/ _/_/ _/_/ El amor es poner tu felicidad en la felicidad de otro - Leibniz -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBR6ImAplgi5GaxT1NAQJy6QQAm865PjzCGcJb70HMgrwDDOVHz3+kLvwA JlLA2icsMp+FdbuSO1xYU2AYejxFYTxzjrwLyi/vqbaDMM+HZzkOPRk8TXsgBPB+ 2aHQArFfS3ih3ZYakW0A0x5h35vykeu/Cl9aRjOrCSERkVsqjkXnQSceGKSdgz5J mMPWKBUWnyI= =UoBx -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hardware RAID vs. ZFS RAID
Vincent Fox wrote: So the point is, a JBOD with a flash drive in one (or two to mirror the ZIL) of the slots would be a lot SIMPLER. We've all spent the last decade or two offloading functions into specialized hardware, that has turned into these massive unneccessarily complex things. I don't want to go to a new training class everytime we buy a new model of storage unit. I don't want to have to setup a new server on my private network to run the Java GUI management software for that array and all the other BS that array vendors put us through. I just want storage. Good Point. -Kyle This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hardware RAID vs. ZFS RAID
Kyle McDonald wrote: Vincent Fox wrote: So the point is, a JBOD with a flash drive in one (or two to mirror the ZIL) of the slots would be a lot SIMPLER. We've all spent the last decade or two offloading functions into specialized hardware, that has turned into these massive unneccessarily complex things. I don't want to go to a new training class everytime we buy a new model of storage unit. I don't want to have to setup a new server on my private network to run the Java GUI management software for that array and all the other BS that array vendors put us through. I just want storage. Good Point. You still need interfaces, of some kind, to manage the device. Temp sensors? Drive fru information? All that information has to go out, and some in, over an interface of some sort. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Hardware RAID vs. ZFS RAID
[EMAIL PROTECTED] said: You still need interfaces, of some kind, to manage the device. Temp sensors? Drive fru information? All that information has to go out, and some in, over an interface of some sort. Looks like the Sun 2530 array recently added in-band management over the SAS (data) interface. Maybe the SAS/SATA JBOD products will be able to do the same. So, how long before we can buy an NVRAM cache card or SSD that we can put in a Thumper to make NFS go fast? That, plus Solaris-10 bits to use them, are what we're in need of. Regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss