Re: [zfs-discuss] Read Only Zpool: ZFS and Replication

Ben Rockwood Mon, 05 Feb 2007 15:50:19 -0800

Jim Dunham wrote:

Robert,
Hello Ben,
Monday, February 5, 2007, 9:17:01 AM, you wrote:

BR> I've been playing with replication of a ZFS Zpool using the
BR> recently released AVS.  I'm pleased with things, but just
BR> replicating the data is only part of the problem.  The big
BR> question is: can I have a zpool open in 2 places?BR> What I really want is a Zpool on node1 open and writable
BR> (production storage) and a replicated to node2 where its open for
BR> read-only access (standby storage).

BR> This is an old problem.  I'm not sure its remotely possible.  Its
BR> bad enough with UFS, but ZFS maintains a hell of a lot more
BR> meta-data.  How is node2 supposed to know that a snapshot has been
BR> created for instance.  With UFS you can at least get by some of
BR> these problems using directio, but thats not an option with a zpool.

BR> I know this is a fairly remedial issue to bring up... but if I
BR> think about what I want Thumper-to-Thumper replication to look
BR> like, I want 2 usable storage systems.  As I see it now the
BR> secondary storage (node2) is useless untill you break replication
BR> and import the pool, do your thing, and then re-sync storage tore-enable replication.
BR> Am I missing something? I'm hoping there is an option I'm notaware of.
You can't mount rw on one node and ro on another (not to mention that
zfs doesn't offer you to import RO pools right now). You can mount the
same file system like UFS in RO on both nodes but not ZFS (no roimport).
One can not just mount a filesystem in RO mode if SNDR or any otherhost-based or controller-based replication is underneath. For allfilesystems that I know of, expect of course shared-reader QFS, thiswill fail given time.
Even if one has the means to mount a filesystem with DIRECTIO(no-caching), READ-ONLY (no-writes), it does not prevent a filesystemfrom looking at the contents of block "A" and then acting on block"B". The reason being is that during replication at time T1 bothblocks "A" & "B" could be written and be consistent with each other.Next the file system reads block "A". Now replication at time T2updates blocks "A" & "B", also consistent with each other. Next thefile system reads block "B" and panics due to an inconsistency only itsees between old "A" and new "B". I know this for a fact, since aforced "zpool import -f <name>", is a common instance of this exactfailure, due most likely checksum failures between metadata blocks "A"& "B".

Ya, that bit me last night. 'zpool import' shows the pool fine, butwhen you force the import you panic:

Feb 5 07:14:10 uma ^Mpanic[cpu0]/thread=fffffe8001072c80:Feb 5 07:14:10 uma genunix: [ID 809409 kern.notice] ZFS: I/O failure (write on <unknown> off 0: zio fffffe80c54ed380 [L0 unallocated] 400L/200P DVA[0]=<0:360000000:200> DVA[1]=<0:9c0003800:200> DVA[2]=<0:20004e00:200> fletcher4 lzjb LE contiguous birth=57416 fill=0 cksum=de2e56ffd:5591b77b74b:1101a91d58dfc:252efdf22532d0): error 5Feb 5 07:14:11 uma unix: [ID 100000 kern.notice]Feb 5 07:14:11 uma genunix: [ID 655072 kern.notice] fffffe8001072a40 zfs:zio_done+140 ()

Feb  5 07:14:11 uma genunix: [ID 655072 kern.notice] fffffe8001072a60 
zfs:zio_next_stage+68 ()
Feb  5 07:14:11 uma genunix: [ID 655072 kern.notice] fffffe8001072ab0 
zfs:zio_wait_for_children+5d ()
Feb  5 07:14:11 uma genunix: [ID 655072 kern.notice] fffffe8001072ad0 
zfs:zio_wait_children_done+20 ()
Feb  5 07:14:11 uma genunix: [ID 655072 kern.notice] fffffe8001072af0 
zfs:zio_next_stage+68 ()
Feb  5 07:14:11 uma genunix: [ID 655072 kern.notice] fffffe8001072b40 
zfs:zio_vdev_io_assess+129 ()
Feb  5 07:14:11 uma genunix: [ID 655072 kern.notice] fffffe8001072b60 
zfs:zio_next_stage+68 ()
Feb  5 07:14:11 uma genunix: [ID 655072 kern.notice] fffffe8001072bb0 
zfs:vdev_mirror_io_done+2af ()
Feb  5 07:14:11 uma genunix: [ID 655072 kern.notice] fffffe8001072bd0 
zfs:zio_vdev_io_done+26 ()
Feb  5 07:14:11 uma genunix: [ID 655072 kern.notice] fffffe8001072c60 
genunix:taskq_thread+1a7 ()
Feb  5 07:14:11 uma genunix: [ID 655072 kern.notice] fffffe8001072c70 
unix:thread_start+8 ()

Feb 5 07:14:11 uma unix: [ID 100000 kern.notice]So without using II, whats the best method of bring up the secondarystorage? Is just dropping the primary into logging acceptable?


benr.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Read Only Zpool: ZFS and Replication

Reply via email to