Hi,

see comments inline:

Lloyd H. Gill wrote:

Hello folks,

I am sure this topic has been asked, but I am new to this list. I have read a ton of doc’s on the web, but wanted to get some opinions from you all. Also, if someone has a digest of the last time this was discussed, you can just send that to me. In any case, I am reading a lot of mixed reviews related to ZFS on HW RAID devices.

The Sun docs seem to indicate it possible, but not a recommended course. I realize there are some advantages, such as snapshots, etc. But, the h/w raid will handle ‘most’ disk problems, basically reducing the great capabilities of the big reasons to deploy zfs. One suggestion would be to create the h/w RAID LUNs as usual, present them to the OS, then do simple striping with ZFS. Here are my two applications, where I am presented with this possibility:

Of course you can use zfs on disk arrays with RAID done in HW, and you still will be able to use most of ZFS features including snapshots, clones, compression, etc.

It is not recommended in that sense that unless ZFS has a pool in redundant configuration from ZFS point of view it won't be able to heal corrupted blocks if they occur (but will be able to detect them). Most other filesystem in a market won't even detect such a case not to mention repair it so if you are ok with not having this great zfs feature then go-ahead. All the other features of zfs will work as expected.

Now, if you want to present several LUNs with RAID done in HW, then yest the best approach usually is to add all them to a pool in a striped configuration. ZFS will always put 2 or 3 copies of metadata on different LUNs if possible so you will end-up with some protection (self-healing) from zfs - for metadata at least.

Other option (more expensive) is to do raid-10 or raid-z on top of LUNs which are already protected with some RAID level on a disk array, so for example if you would present 4 luns each with RAID-5 done in HW and then create a pool 'zpool create test mirror lun1 lun2 mirror lun3 lun4' you woule effectively end-up with RAID-50 configuration but it would of course halve available logical storage but would allow zfs to do a self-healing.

Sun Messaging Environment:
We currently use EMC storage. The storage team manages all Enterprise storage. We currently have 10x300gb UFS mailstores presented to the OS. Each LUN is a HW RAID 5 device. We will be upgrading the application and doing a hardware refresh of this environment, which will give us the chance to move to ZFS, but stay on EMC storage. I am sure the storage team will not want to present us with JBOD. It is there practice to create the HW LUNs and present them to the application teams. I don’t want to end up with a complicated scenario, but would like to leverage the most I can with ZFS, but on the EMC array as I mentioned.

just create a pool which would stripe across such luns.




Sun Directory Environment:
The directory team is running HP DL385 G2, which also has a built-in HW RAID controller for 5 internal SAS disks. The team currently has DS5.2 deployed on RHEL3, but as we move to DS6.3.1, they may want to move to Solaris 10. We have an opportunity to move to ZFS in this environment, but am curious how to best leverage ZFS capabilities in this scenario. JBOD is very clear, but a lot of manufacturers out there are still offering HW RAID technologies, with high-speed caches. Using ZFS with these is not very clear to me, and as I mentioned, there are very mixed reviews, not on ZFS features, but how it’s used in HW RAID settings.

Here you have three options. RAID in HW with one LUN and then just create a pool on top of it. ZFS will be able to detect a corruption if it happens but won't be able to fix it (at least not for data).

Another option is to present each disk as RAID-0 LUN and then do a RAID-10 or RAID-Z in ZFS. Most RAID controllers will still use their cache in such a configuration so you would still benefit from it. And ZFS will be able to detect and fix corruption if it happens. However a procedure of replacing a failed disk drive could be more complicated or even require a downtime depending on a controller and if there is a management tool on solaris for it (otherwise if disk dies in many pci controllers with one disk in raid-0 you will have to go into its bios and re-create a failed disk with a new one). But check your controller maybe it is not an issue for you or maybe it is even acceptable approach.

The last option would be to disable RAID controller and access disk directly and do raid in zfs. That way you lost your cache of course.

If your applications are sensitive to a write latency to your ldap database that going with one of the first two options could actually prove to be a faster solution (assuming the volume of writes is not so big that a cache will be 100% utilized all the time as then it is down to disks).


Another thing you need to consider if you want to use RAID-5 or RAID-Z is your workload. If you are going to issue lots of small random reads in parallel (from multiple threads and./or processes) then in most cases HW RAID-5 will be much faster than RAID-Z. Then going with RAID-5 in HW and striping (or mirroring) LUNs would be better option from performance point of view.

However if you are going to do only random writes with so big constant throughput that your caches on RAID controllers will be fully saturated all the time then ZFS RAID-Z should prove faster.

If you are somewhere in-between then... well, get it tested... :)

Or maybe you are in 80% basket of environments where with modern HW the performance will be acceptable from practical point of view regardless of the approach you take and then you should focus on features and easy of management and service.

--
Robert Milkowski
http://milek.blogspot.com

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to