Re: [zfs-discuss] [storage-discuss] AVS on opensolaris 2008.11
Richard Elling wrote: > Jim Dunham wrote: >> Ahmed, >> >>> The setup is not there anymore, however, I will share as much >>> details >>> as I have documented. Could you please post the commands you have >>> used >>> and any differences you think might be important. Did you ever test >>> with 2008.11 ? instead of sxce ? >>> >> >> Specific to the following: >> > While we should be getting minimal performance hit (hopefully), > we got > a big performance hit, disk throughput was reduced to almost 10% > of > the normal rate. > >> >> It looks like I need to test on OpenSoalris 2008.11, not Solaris >> Express CE (b105), since this version does not have access to a >> version of 'dd' with a oflag= setting. >> >> # dd if=/dev/zero of=/dev/zvol/rdsk/gold/xxVolNamexx oflag=dsync >> bs=256M count=10 >> dd: bad argument: "oflag=dsync >> > > Congratulations! You've been bit by the gnu-compatibility feature! Oh that's what one calls it... a feature? > SXCE and OpenSolaris have more than one version of dd. The difference > is that OpenSolaris sets your default PATH to use /usr/gnu/bin/dd, > which > has the oflag option, while SXCE sets your default PATH to use /usr/ > bin/dd. Thank you, Jim > > -- richard > >> Using a setting of 'oflag=dsync' will have performance implications. >> >> Also there is an issue with an I/O of size bs=256M. SNDR's >> internal architecture has a I/O unit chunk size of one bit in >> 32KB". Therefore when doing an I/O of 256MB, this results in the >> need to set 8192 bits, 1024 bytes, or 1KB of data with 0xFF. >> Although testing with an /O size of 256MB is interesting, typical >> I/O tests are more like the following: >> http://www.opensolaris.org/os/community/performance/filebench/quick_start/ >> >> - Jim >> >> ___ >> zfs-discuss mailing list >> zfs-discuss@opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [storage-discuss] AVS on opensolaris 2008.11
Jim Dunham wrote: > Ahmed, > > >> The setup is not there anymore, however, I will share as much details >> as I have documented. Could you please post the commands you have used >> and any differences you think might be important. Did you ever test >> with 2008.11 ? instead of sxce ? >> > > Specific to the following: > > While we should be getting minimal performance hit (hopefully), we got a big performance hit, disk throughput was reduced to almost 10% of the normal rate. > > It looks like I need to test on OpenSoalris 2008.11, not Solaris > Express CE (b105), since this version does not have access to a > version of 'dd' with a oflag= setting. > > # dd if=/dev/zero of=/dev/zvol/rdsk/gold/xxVolNamexx oflag=dsync > bs=256M count=10 > dd: bad argument: "oflag=dsync > Congratulations! You've been bit by the gnu-compatibility feature! SXCE and OpenSolaris have more than one version of dd. The difference is that OpenSolaris sets your default PATH to use /usr/gnu/bin/dd, which has the oflag option, while SXCE sets your default PATH to use /usr/bin/dd. -- richard > Using a setting of 'oflag=dsync' will have performance implications. > > Also there is an issue with an I/O of size bs=256M. SNDR's internal > architecture has a I/O unit chunk size of one bit in 32KB". Therefore > when doing an I/O of 256MB, this results in the need to set 8192 bits, > 1024 bytes, or 1KB of data with 0xFF. Although testing with an /O > size of 256MB is interesting, typical I/O tests are more like the > following: > http://www.opensolaris.org/os/community/performance/filebench/quick_start/ > > - Jim > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [storage-discuss] AVS on opensolaris 2008.11
Ahmed, > The setup is not there anymore, however, I will share as much details > as I have documented. Could you please post the commands you have used > and any differences you think might be important. Did you ever test > with 2008.11 ? instead of sxce ? Specific to the following: >>> While we should be getting minimal performance hit (hopefully), we >>> got >>> a big performance hit, disk throughput was reduced to almost 10% of >>> the normal rate. It looks like I need to test on OpenSoalris 2008.11, not Solaris Express CE (b105), since this version does not have access to a version of 'dd' with a oflag= setting. # dd if=/dev/zero of=/dev/zvol/rdsk/gold/xxVolNamexx oflag=dsync bs=256M count=10 dd: bad argument: "oflag=dsync" Using a setting of 'oflag=dsync' will have performance implications. Also there is an issue with an I/O of size bs=256M. SNDR's internal architecture has a I/O unit chunk size of one bit in 32KB". Therefore when doing an I/O of 256MB, this results in the need to set 8192 bits, 1024 bytes, or 1KB of data with 0xFF. Although testing with an /O size of 256MB is interesting, typical I/O tests are more like the following: http://www.opensolaris.org/os/community/performance/filebench/quick_start/ - Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [storage-discuss] AVS on opensolaris 2008.11
Hi Jim, The setup is not there anymore, however, I will share as much details as I have documented. Could you please post the commands you have used and any differences you think might be important. Did you ever test with 2008.11 ? instead of sxce ? I will probably be testing again soon. Any tips or obvious errors are welcome :) ->8- The Setup * A 100G zvol has been setup on each node of an AVS replicating pair * A "ramdisk" has been setup on each node using ramdiskadm -a ram1 10m * The replication relationship has been setup using sndradm -E pri /dev/zvol/rdsk/gold/myzvol /dev/rramdisk/ram1 sec /dev/zvol/rdsk/gold/myzvol /dev/rramdisk/ram1 ip async * The AVS driver was configured to not log the disk bitmap to disk, rather to keep it in kernel memory and write it to disk only upon machine shutdown. This is configured as such grep bitmap_mode /usr/kernel/drv/rdc.conf rdc_bitmap_mode=2; * The replication was configured to be in logging mode sndradm -P /dev/zvol/rdsk/gold/myzvol <- pri:/dev/zvol/rdsk/gold/myzvol autosync: off, max q writes: 4096, max q fbas: 16384, async threads: 2, mode: async, state: logging Testing was done with: dd if=/dev/zero of=/dev/zvol/rdsk/gold/xxVolNamexx oflag=dsync bs=256M count=10 * Option 'dsync' is chosen to try avoiding zfs's aggressive caching. Moreover however, usually a couple of runs were launched initially to fill the instant zfs cache and to force real writing to disk * Option 'bs=256M' was used in order to avoid the overhead of copying multiple small blocks to kernel memory before disk writes. A larger bs size ensures max throughput. Smaller values were used without much difference though The results on multiple runs Non Replicated Vol Throughputs: 42.2, 52.8, 50.9 MB/s Replicated Vol Throughputs: 4.9, 5.5, 4.6 MB/s -->8- Regards On Mon, Jan 26, 2009 at 1:22 AM, Jim Dunham wrote: > Ahmed, > >> Thanks for your informative reply. I am involved with kristof >> (original poster) in the setup, please allow me to reply below >> >>> Was the follow 'test' run during resynchronization mode or replication >>> mode? >>> >> >> Neither, testing was done while in logging mode. This was chosen to >> simply avoid any network "issues" and to get the setup working as fast >> as possible. The setup was created with: >> >> sndradm -E pri /dev/zvol/rdsk/gold/myzvol /dev/rramdisk/ram1 sec >> /dev/zvol/rdsk/gold/myzvol /dev/rramdisk/ram1 ip async >> >> Note that the logging disks are ramdisks again trying to avoid disk >> contention and get fastest performance (reliability is not a concern >> in this test). Before running the tests, this was the state >> >> #sndradm -P >> /dev/zvol/rdsk/gold/myzvol <- pri:/dev/zvol/rdsk/gold/myzvol >> autosync: off, max q writes: 4096, max q fbas: 16384, async threads: >> 2, mode: async, state: logging >> >> While we should be getting minimal performance hit (hopefully), we got >> a big performance hit, disk throughput was reduced to almost 10% of >> the normal rate. > > Is it possible to share information on your ZFS storage pool configuration, > your testing tool, testing types and resulting data? > > I just downloaded Solaris Express CE (b105) > http://opensolaris.org/os/downloads/sol_ex_dvd_1/, configured ZFS in > various storage pool types, SNDR with and without RAM disks, and I do not > see that disk throughput was reduced to almost 10% o the normal rate. Yes > there is some performance impact, but no where near there amount reported. > > There are various factors which could come into play here, but the most > obvious reason that someone may see a serious performance degradation as > reported, is that prior to SNDR being configured, the existing system under > test was already maxed out on some system limitation, such as CPU and > memory. I/O impact should not be a factor, given that a RAM disk is used. > The addition of both SNDR and a RAM disk in the data, regardless of how > small their system cost is, will have a profound impact on disk throughput. > > Jim > >> >> Please feel free to ask for any details, thanks for the help >> >> Regards >> ___ >> storage-discuss mailing list >> storage-disc...@opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/storage-discuss > > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [storage-discuss] AVS on opensolaris 2008.11
Ahmed, > Thanks for your informative reply. I am involved with kristof > (original poster) in the setup, please allow me to reply below > >> Was the follow 'test' run during resynchronization mode or >> replication >> mode? >> > > Neither, testing was done while in logging mode. This was chosen to > simply avoid any network "issues" and to get the setup working as fast > as possible. The setup was created with: > > sndradm -E pri /dev/zvol/rdsk/gold/myzvol /dev/rramdisk/ram1 sec > /dev/zvol/rdsk/gold/myzvol /dev/rramdisk/ram1 ip async > > Note that the logging disks are ramdisks again trying to avoid disk > contention and get fastest performance (reliability is not a concern > in this test). Before running the tests, this was the state > > #sndradm -P > /dev/zvol/rdsk/gold/myzvol <- pri:/dev/zvol/rdsk/gold/myzvol > autosync: off, max q writes: 4096, max q fbas: 16384, async threads: > 2, mode: async, state: logging > > While we should be getting minimal performance hit (hopefully), we got > a big performance hit, disk throughput was reduced to almost 10% of > the normal rate. Is it possible to share information on your ZFS storage pool configuration, your testing tool, testing types and resulting data? I just downloaded Solaris Express CE (b105) http://opensolaris.org/os/downloads/sol_ex_dvd_1/ , configured ZFS in various storage pool types, SNDR with and without RAM disks, and I do not see that disk throughput was reduced to almost 10% o the normal rate. Yes there is some performance impact, but no where near there amount reported. There are various factors which could come into play here, but the most obvious reason that someone may see a serious performance degradation as reported, is that prior to SNDR being configured, the existing system under test was already maxed out on some system limitation, such as CPU and memory. I/O impact should not be a factor, given that a RAM disk is used. The addition of both SNDR and a RAM disk in the data, regardless of how small their system cost is, will have a profound impact on disk throughput. Jim > > Please feel free to ask for any details, thanks for the help > > Regards > ___ > storage-discuss mailing list > storage-disc...@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/storage-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [storage-discuss] AVS on opensolaris 2008.11
Hi Jim, Thanks for your informative reply. I am involved with kristof (original poster) in the setup, please allow me to reply below > Was the follow 'test' run during resynchronization mode or replication > mode? > Neither, testing was done while in logging mode. This was chosen to simply avoid any network "issues" and to get the setup working as fast as possible. The setup was created with: sndradm -E pri /dev/zvol/rdsk/gold/myzvol /dev/rramdisk/ram1 sec /dev/zvol/rdsk/gold/myzvol /dev/rramdisk/ram1 ip async Note that the logging disks are ramdisks again trying to avoid disk contention and get fastest performance (reliability is not a concern in this test). Before running the tests, this was the state #sndradm -P /dev/zvol/rdsk/gold/myzvol <- pri:/dev/zvol/rdsk/gold/myzvol autosync: off, max q writes: 4096, max q fbas: 16384, async threads: 2, mode: async, state: logging While we should be getting minimal performance hit (hopefully), we got a big performance hit, disk throughput was reduced to almost 10% of the normal rate. Please feel free to ask for any details, thanks for the help Regards ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [storage-discuss] AVS on opensolaris 2008.11
Kristof, > Jim Yes, in step 5 commands were executed on both nodes. > > We did some more tests with opensolaris 2008.11. (build 101b) > > We managed to get AVS setup up and running, but we noticed that > performance was really bad. > > When we configured a zfs volume for replication, we noticed that > write performance went down from 50 MB/s to 5 MB/sec. SNDR replication has three modes of operation, and I/O performance varies quite differently for each one. They are: 1). Logging mode - As primary volume write I/Os occur, the bitmap volume is used to scoreboard unreplicated write I/Os, at which time the write I/O completes. 2). Resynchronization mode - A resynchronization thread traverses the scoreboard, in block order, replicating write I/Os for each bit set. Concurrently, as primary volume write I/Os occur, the bitmap volume is used to scoreboard unreplicated write I/Os. For write I/Os that occur (block order wise) after the resynchronization point, the write I/O completes. For writes I/Os the occur before the resynchronization point, they must be synchronously replicated in place. At the start of resynchronization, almost all write I/Os complete quickly, as they occur after the resynchronization point. As resynchronization nears completion, almost all write I/Os complete slowly, as they occur before the resynchronization point. When the resynchronization point reaches the end of the scoreboard, the SNDR primary and secondary volumes are now 100% identical, write-order consistent, and asynchronous replication begins. 3). Replication mode - Primary volume write I/Os are queue up to SNDR's memory queue (or optionally configured disk queue), and scoreboarded for replication, at which time the write I/O completes. In the back ground, multiple asynchronous flusher threads, dequeue unreplicated I/Os from SNDR's memory or disk queue On configurations with ample system resources, write performance for both logging mode and replication mode should be nearly identical. The duration that a replica is in resynchronization mode is influence by the amount of write I/Os that occurred while the replica was in logging mode, the amount of primary volume write I/Os while resynchronization is also active, the network bandwidth and latency between primary and secondary nodes, and the I/O performance of the remote node's secondary volume. First time synchronization, done after the SNDR enable "sndradm - e ..." is identical to resynchronization, except the bitmap volume is intentionally set to ALL ones, forcing every block to be replicated from primary to secondary. Now if one configured replication before the initial "zpool create" , the SNDR primary and secondary volumes both contain uninitialized data, and thus can be considered equal, therefore no synchronization is needed. This is accomplished be using the "sndradm -E ..." option, setting the bitmap volume to ALL zeros. This means that the switch from logging mode, to replication mode is nearly instant. If one has a ZFS storage pool, plus available storage that can be provisioned as zpool replacement volumes, these replacement volumes can be "sndradm -E ..", enabled first. Now when the "zpool replace ..." command is invoked, the write I/Os caused by ZFS to populate the replacement volume, will are cause SNDR to replicate only those write I/Os. This operation is done under SNDR's replication mode, not synchronization mode, and is also an ZFS background operations. Once the zpool replace is complete, the previously used storage can be reclaimed. > A few notes about our test setup: > > * Since replication is configured in logging mode, there is zero > network traffic > * Since rdc_bitmap_mode has been configured for memory, and even > more, since the bitmap device is a ramdisk. Any data IO on the > replicated volume, results only in a single memory bit flip (per 32k > disk space) > * This setup is the bare minimum in the sense that the kernel driver > only hooks disk writes, and flips a bit in memory, it cannot go any > faster! Was the follow 'test' run during resynchronization mode or replication mode? > The Test > > * All tests were performed using the following command line > # dd if=/dev/zero of=/dev/zvol/rdsk/gold/xxVolNamexx oflag=dsync > bs=256M count=10 > > * Option 'dsync' is chosen to try avoiding zfs's aggressive caching. > Moreover however, usually a couple of runs were launched initially > to fill the instant zfs cache and to force real writing to disk > * Option 'bs=256M' was used in order to avoid the overhead of > copying multiple small blocks to kernel memory before disk writes. A > larger bs size ensures max throughput. Smaller values were used > without much difference though > -- > This message posted from opensolaris.org > ___ > storage-discuss mailing list > storage-disc...@opens