On Sep 16, 2008, at 5:39 PM, Miles Nordin wrote:

>>>>>> "jd" == Jim Dunham <[EMAIL PROTECTED]> writes:
>
>    jd> If at the time the SNDR replica is deleted the set was
>    jd> actively replicating, along with ZFS actively writing to the
>    jd> ZFS storage pool, I/O consistency will be lost, leaving ZFS
>    jd> storage pool in an indeterministic state on the remote node.
>
>    jd> To address this issue, prior to deleting the replicas, the
>    jd> replica should be placed into logging mode first.
>
> What if you stop the replication by breaking the network connection
> between primary and replica?  consistent or inconsistent?

Consistent.

> it sounds fishy, like ``we're always-consistent-on-disk with ZFS, but
> please use 'zpool offline' to avoid disastrous pool corruption.''

This is not the case at all.

Maintaining I/O consistency of all volumes in a single I/O consistency  
group, is an attribute of replication. The instant an SNDR replica is  
deleted, that volume is no longer being replicated, and it becomes  
inconsistent with all other write-order volumes. By placing all  
volumes in the I/O consistency group in logging mode, not 'zpool  
offline', and then deleting the replica there is no means for any of  
the remote volumes to become I/O inconsistent.

Yes, one will note that there is a group disable command "sndradm -g  
<group-name> -d", but it was implemented for easy of administration,  
not for performing a write-order coordinated disable command.

>    jd> ndr_ii. This is an automatic snapshot taken before
>    jd> resynchronization starts,
>
> yeah that sounds fine, possibly better than DRBD in one way because it
> might allow the resync to go faster.
>
> From the PDF's it sounds like async replication isn't done the same
> way as the resync, it's done safely, and that it's even possible for
> async replication to accumulate hours of backlog in a ``disk queue''
> without losing write ordering so long as you use the ``blocking mode''
> variant of async.

Correct reading of the documentation.

> ii might also be good for debugging a corrupt ZFS, so you can tinker
> with it but still roll back to the original corrupt copy.  I'll read
> about it---I'm guessing I will need to prepare ahead of time if I want
> ii available in the toolbox after a disaster.
>
>    jd> AVS has the concept of I/O consistency groups, where all disks
>    jd> of a multi-volume filesystem (ZFS, QFS) or database (Oracle,
>    jd> Sybase) are kept write-order consistent when using either sync
>    jd> or async replication.
>
> Awesome, so long as people know to use it.  so I guess that's the
> answer for the OP: use consistency groups!

I use the name of the ZFS storage pool, as the name of the SNDR I/O  
consistency group.

> The one thing I worry about is, before, AVS was used between RAID and
> filesystem, which is impossible now because that inter-layer area n
> olonger exists.  If you put the individual device members of a
> redundant zpool vdev into an AVS consistency group, what will AVS do
> when one of the devices fails?

Nothing, as it is ZFS the reacts to the failed device

> Does it continue replicating the working devices and ignore the  
> failed one?

In this scenario ZFS knows he device failed, which means ZFS will stop  
writing to the disk, and thus the replica.


> This would sacrifice redundancy at the DR site.  UFS-AVS-RAID
> would not do that in the same situation.
>
> Or hide the failed device from ZFS and slow things down by sending all
> read/writes of the failed device to the remote mirror?  This would
> slwo down the primary site.  UFS-AVS-RAID would not do that in the
> same situation.
>
> The latter ZFS-AVS behavior might be rescueable, if ZFS had the
> statistical read-preference feature.  but writes would still be
> massively slowed with this scenario, while in UFS-AVS-RAID they would
> not be.  To get back the level of control one used to have for writes,
> you'd need a different zpool-level way to achieve the intent of the
> AVS sync/async option.  Maybe just a slog which is not AVS-replicated
> would be enough, modulo other ZFS fixes for hiding slow devices.

ZFS-AVS is not UFS-AVS-RAID, and although one can foresee some  
downside to replicating ZFS with AVS, there are some big wins.

Place SNDR in logging mode, and zpool scrub the secondary volumes for  
consistency, then resume replication.
Compressed ZFS Storage pools, result in compressed replication
Encrypted ZFS Storage pools, result in encrypted replication


>
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jim Dunham
Engineering Manager
Storage Platform Software Group
Sun Microsystems, Inc.

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to