Re: [zfs-discuss] [storage-discuss] AVS on opensolaris 2008.11

Jim Dunham Sat, 24 Jan 2009 07:32:54 -0800

Kristof,

> Jim Yes, in step 5 commands were executed on both nodes.
>
> We did some more tests with opensolaris 2008.11. (build 101b)
>
> We managed to get AVS setup up and running, but we noticed that  
> performance was really bad.
>
> When we configured a zfs volume for replication, we noticed that  
> write performance went down from 50 MB/s to 5 MB/sec.


SNDR replication has three modes of operation, and I/O performance  
varies quite differently for each one. They are:

1). Logging mode - As primary volume write I/Os occur, the bitmap  
volume is used to scoreboard unreplicated write I/Os, at which time  
the write I/O completes.

2). Resynchronization mode - A resynchronization thread traverses the  
scoreboard, in block order, replicating write I/Os for each bit set.  
Concurrently, as primary volume write I/Os occur, the bitmap volume is  
used to scoreboard unreplicated write I/Os. For write I/Os that occur  
(block order wise) after the resynchronization point, the write I/O  
completes. For writes I/Os the occur before the resynchronization  
point, they must be synchronously replicated in place. At the start of  
resynchronization, almost all write I/Os complete quickly, as they  
occur after the resynchronization point. As resynchronization nears  
completion, almost all write I/Os complete slowly, as they occur  
before the resynchronization point. When the resynchronization point  
reaches the end of the scoreboard, the SNDR primary and secondary  
volumes are now 100% identical, write-order consistent, and  
asynchronous replication begins.

3). Replication mode - Primary volume write I/Os are queue up to  
SNDR's memory queue (or optionally configured disk queue), and  
scoreboarded for replication, at which time the write I/O completes.  
In the back ground, multiple asynchronous flusher threads, dequeue  
unreplicated I/Os from  SNDR's memory or disk queue

On configurations with ample system resources, write performance for  
both logging mode and replication mode should be nearly identical.

The duration that a replica is in resynchronization mode is influence  
by the amount of write I/Os that occurred while the replica was in  
logging mode, the amount of primary volume write I/Os while  
resynchronization is also active, the network bandwidth and latency  
between primary and secondary nodes, and the I/O performance of the  
remote node's secondary volume.

First time synchronization, done after the SNDR enable "sndradm - 
e ..." is identical to resynchronization, except the bitmap volume is  
intentionally set to ALL ones, forcing every block to be replicated  
from primary to secondary. Now if one configured replication before  
the initial "zpool create" , the SNDR primary and secondary volumes  
both contain uninitialized data, and thus can be considered equal,  
therefore no synchronization is needed.  This is accomplished be using  
the "sndradm -E ..." option, setting the bitmap volume to ALL zeros.  
This means that the switch from logging mode, to replication mode is  
nearly instant.

If one has a ZFS storage pool, plus available storage that can be  
provisioned as zpool replacement volumes, these replacement volumes  
can be "sndradm -E ..", enabled first. Now when the "zpool  
replace ..." command is invoked, the write I/Os caused by ZFS to  
populate the replacement volume, will are cause SNDR to replicate only  
those write I/Os. This operation is done under SNDR's replication  
mode, not synchronization mode, and is also an ZFS background  
operations. Once the zpool replace is complete, the previously used  
storage can be reclaimed.


> A few notes about our test setup:
>
> *  Since replication is configured in logging mode, there is zero  
> network traffic
> *  Since rdc_bitmap_mode has been configured for memory, and even  
> more, since the bitmap device is a ramdisk. Any data IO on the  
> replicated volume, results only in a single memory bit flip (per 32k  
> disk space)
> * This setup is the bare minimum in the sense that the kernel driver  
> only hooks disk writes, and flips a bit in memory, it cannot go any  
> faster!

Was the follow 'test' run during resynchronization mode or replication  
mode?

> The Test
>
> * All tests were performed using the following command line
> # dd if=/dev/zero of=/dev/zvol/rdsk/gold/xxVolNamexx oflag=dsync  
> bs=256M count=10
>
> * Option 'dsync' is chosen to try avoiding zfs's aggressive caching.  
> Moreover however, usually a couple of runs were launched initially  
> to fill the instant zfs cache and to force real writing to disk
> * Option 'bs=256M' was used in order to avoid the overhead of  
> copying multiple small blocks to kernel memory before disk writes. A  
> larger bs size ensures max throughput. Smaller values were used  
> without much difference though
> -- 
> This message posted from opensolaris.org
> _______________________________________________
> storage-discuss mailing list
> storage-disc...@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/storage-discuss


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [storage-discuss] AVS on opensolaris 2008.11

Reply via email to