On Mon, May 31, 2010 at 4:32 PM, Sandon Van Ness <san...@van-ness.com> wrote:
> On 05/31/2010 01:51 PM, Bob Friesenhahn wrote:
>> There are multiple factors at work.  Your OpenSolaris should be new
>> enough to have the fix in which the zfs I/O tasks are run in in a
>> scheduling class at lower priority than normal user processes.
>> However, there is also a throttling mechanism for processes which
>> produce data faster than can be consumed by the disks.  This
>> throttling mechanism depends on the amount of RAM available to zfs and
>> the write speed of the I/O channel.  More available RAM results in
>> more write buffering, which results in a larger chunk of data written
>> at the next transaction group write interval.  The maximum size of a
>> transaction group may be configured in /etc/system similar to:
>>
>> * Set ZFS maximum TXG group size to 2684354560
>> set zfs:zfs_write_limit_override = 0xa0000000
>>
>> If the transaction group is smaller, then zfs will need to write more
>> often.  Processes will still be throttled but the duration of the
>> delay should be smaller due to less data to write in each burst.  I
>> think that (with multiple writers) the zfs pool will be "healthier"
>> and less fragmented if you can offer zfs more RAM and accept some
>> stalls during writing.  There are always tradeoffs.
>>
>> Bob
> well it seems like when messing with the txg sync times and stuff like
> that it did make the transfer more smooth but didn't actually help with
> speeds as it just meant the hangs happened for a shorter time but at a
> smaller interval and actually lowering the time between writes just
> seemed to make things worse (slightly).
>
> I think I have came to the conclusion that the problem here is CPU due
> to the fact that its only doing this with parity raid. I would think if
> it was I/O based then it would be the same as if anything its heavier on
> I/O on non parity raid due to the fact that it is no longer CPU
> bottlenecked (dd write test gives me near 700 megabytes/sec vs 450 with
> parity raidz2).

To see if the CPU is pegged, take a look at the output of:

mpstat 1
prstat -mLc 1

If mpstat shows that the idle time reaches 0 or the process' latency
column is more then a few tenths of a percent, you are probably short
on CPU.

It could also be that interrupts are stealing cycles from rsync.
Placing it in a processor set with interrupts disabled in that
processor set may help.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to