Nathan,

Keep in mind iSCSI target is only in OpenSolaris at this time.

On 05/30/2007 10:15 PM, Nathan Huisman wrote:

<snip>


===== QUESTION #1

What is the best way to mirror two zfs pools in order to achieve a sort
of HA storage system? I don't want to have to physically swap my disks
into another system if any of the hardware on the ZFS server dies. If I
have the following configuration what is the best way to mirror these in
near real time?

BOX 1 (JBOD->ZFS) BOX 2 (JBOD-ZFS)

I've seen the zfs send and recieve commands but I'm not sure how well
that would work with a close to real time mirror.

If you want this to be redundant (and very scalable) you will want at least 2xBOX 1's and 2x BOX2's. IPMP with redundant GbE switches + NICs as well.

Do not use zfs send/recv. Use Sun Cluster 3.2 for HA-ZFS.

http://docs.sun.com/app/docs/doc/820-0335/6nc35dge2?a=view

There is potential for data loss if the active ZFS node crashes before outstanding transaction groups commit for non-synchronous writes, but the ZVOL (and underlying ext3fs) should not become corrupt (hasn't happened to me yet). Can someone from the ZFS team comment on this?



===== QUESTION #2

Can ZFS be exported via iscsi and then imported as a disk to a linux
system and then be formated with another file system. I wish to use ZFS
as a block level file systems for my virtual machines. Specifically
using xen. If this is possible, how stable is this?

This is possible and is stable in my experience. Scales well if you design your infrastructure correctly.

How is error
checking handled if the zfs is exported via iscsi and then the block
device formated to ext3? Will zfs still be able to check for errors?

Yes, ZFS will detect/correct block level errors in ZVOLs as long as you have a redundant zpool configuration (see note below about LVM)

If this is possible and this all works, then are there ways to expand a
zfs iscsi exported volume and then expand the ext3 file system on the
remote host?


Haven't tested it myself (yet), but should be possible. You might have to export and re-import the iSCSI target on the Xen dom0 and then resize the ext3 partition (e.g. using 'parted'). If that doesn't work there are other ways to accomplish this.

===== QUESTION #3

How does zfs handle a bad drive? What process must I go through in
order to take out a bad drive and replace it with a good one?

If you have a redundant zpool configuration you will replace the failed disk and then issue a 'zpool replace'.


===== QUESTION #4

What is a good way to back up this HA storage unit? Snapshots will
provide an easy way to do it live, but should it be dumped into a tape
library, or an third offsite zfs pool using zfs send/recieve or ?

Send snapshots to another server that has a RAIDZ (or RAIDZ2) zpool (want space vs performace/redundancy for backup. Opposite of the *MIRRORS* you will want to use for the HA-ZFS cluster <-> Storage nodes). From this node you can dump to tape, etc.


===== QUESTION #5

Does the following setup work?

BOX 1 (JBOD) -> iscsi export -> BOX 2 ZFS.

In other words, can I setup a bunch of thin storage boxes with low cpu
and ram instead of using sas or fc to supply the jbod to the zfs server?

Yes. And ZFS+iSCSI makes this relatively cheap. I very strongly recommend against using LVM to handle the mirroring. *You will lose the ability to correct data corruption* at the ZFS level. It also does not scale well, increases complexity, increases cost, and reduces throughput over iSCSI to your ZFS nodes. Leave volume management and redundancy to ZFS.

Set up your Xen dom0 boxes to have a redundant path to your ZVOLs over iSCSI. Send your data _one time_ to your ZFS nodes. Let ZFS handle the mirroring and then send that to your iSCSI LUNs on the storage nodes. Make sure you set up half of each mirror in the zpool with a disk from a separate storage node.

Be wary of layering ZFS/ZVOLs like this. There are multiple ways to set up your storage nodes (plain iscsitadm or using ZVOls), and if you use ZVOLs you may want to disable checksum and leave that to your ZFS nodes.

Other:
-Others have reported that Sil3124 based SATA expansion cards work well with Solaris. -Test your failover times between ZFS nodes (BOX 2s). Having lots of iscsi shares/filesystems can cause this to be slow. Hopefully this will be improved with parallel zpool device mounting in the future. -ZVOLs are not sparse by default. I prefer this, but if you really want to use sparse ZVOLs there is a switch for it in 'zfs create'
 -This will work, but TEST, TEST, TEST for your particular scenario.
-Yes, this can be built for less than $30k US for your storage size requirement. -I get ~150MB/s throughput on this setup with 2 storage nodes of 6 disks each. Appears as ~3TB mirror on ZFS nodes. -Use Build 64 or later, as there is a ZVOL bug in b63 if I'm not mistaken. Probably a good idea to read through the open ZFS bugs, too.
 -Rule of thumb is 1Ghz is needed to saturate 1GbE link.
 -Run 64bit Solaris & give ZFS nodes as much RAM as possible
 -Read the documentation
 -...
 -Profit  ;)

David Anderson



_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to