On 05/31/2010 02:52 PM, Mike Gerdts wrote:
> On Mon, May 31, 2010 at 4:32 PM, Sandon Van Ness <san...@van-ness.com> wrote:
>   
>> On 05/31/2010 01:51 PM, Bob Friesenhahn wrote:
>>     
>>> There are multiple factors at work.  Your OpenSolaris should be new
>>> enough to have the fix in which the zfs I/O tasks are run in in a
>>> scheduling class at lower priority than normal user processes.
>>> However, there is also a throttling mechanism for processes which
>>> produce data faster than can be consumed by the disks.  This
>>> throttling mechanism depends on the amount of RAM available to zfs and
>>> the write speed of the I/O channel.  More available RAM results in
>>> more write buffering, which results in a larger chunk of data written
>>> at the next transaction group write interval.  The maximum size of a
>>> transaction group may be configured in /etc/system similar to:
>>>
>>> * Set ZFS maximum TXG group size to 2684354560
>>> set zfs:zfs_write_limit_override = 0xa0000000
>>>
>>> If the transaction group is smaller, then zfs will need to write more
>>> often.  Processes will still be throttled but the duration of the
>>> delay should be smaller due to less data to write in each burst.  I
>>> think that (with multiple writers) the zfs pool will be "healthier"
>>> and less fragmented if you can offer zfs more RAM and accept some
>>> stalls during writing.  There are always tradeoffs.
>>>
>>> Bob
>>>       
>> well it seems like when messing with the txg sync times and stuff like
>> that it did make the transfer more smooth but didn't actually help with
>> speeds as it just meant the hangs happened for a shorter time but at a
>> smaller interval and actually lowering the time between writes just
>> seemed to make things worse (slightly).
>>
>> I think I have came to the conclusion that the problem here is CPU due
>> to the fact that its only doing this with parity raid. I would think if
>> it was I/O based then it would be the same as if anything its heavier on
>> I/O on non parity raid due to the fact that it is no longer CPU
>> bottlenecked (dd write test gives me near 700 megabytes/sec vs 450 with
>> parity raidz2).
>>     
> To see if the CPU is pegged, take a look at the output of:
>
> mpstat 1
> prstat -mLc 1
>
> If mpstat shows that the idle time reaches 0 or the process' latency
> column is more then a few tenths of a percent, you are probably short
> on CPU.
>
> It could also be that interrupts are stealing cycles from rsync.
> Placing it in a processor set with interrupts disabled in that
> processor set may help.
>
>   

Unfortunately none of these utilies make it possible to ge values for <1
second which is what the hang is (its happening for about 1/2 of a second).

Here is with mpstat:

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    0 20229 20029 23175   25  222  972    0   322    0  31  
0  69
  1    0   0    0    43    9 14926   10  221  909    0 44606   47  14  
0  40
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    0 20230 20029 23618   27  210  914    0   492    0  31  
0  69
  1    0   0    0    50    6 14614   12  215  875    0 45930   47  14  
0  39
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    0 11551 11335 12715   39  154  702    0   385    0  62  
0  38
  1    0   0    0  2457 2379 8545   35  150  739    0 24349   25  53   0  22
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0   14 12804 12591 14333  243  204  610    0   794    1  20  
0  79
  1    0   0    4   977  863 9561   11  210  559    0 24590   27  10   0  63
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    0 20239 20035 22932   46  224  996    0   537    0  31  
0  69
  1    0   0    0    66    7 15498   12  227  930    0 44808   46  14  
0  39

Here is what i get with prstat:

Total: 57 processes, 260 lwps, load averages: 2.15, 2.16, 2.15
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG
PROCESS/LWPID
  1326 root      47  11 0.0 0.0 0.0 0.0  40 1.6  6K  29 45K   0 rsync/1
  1322 root     0.0 1.4 0.0 0.0 0.0 0.0  93 5.4  6K   0   0   0 rsync/1
 22644 root     0.0 0.1 0.0 0.0 0.0 0.0 100 0.0  16   9 255   0 prstat/1
   196 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0  15   0 105   0 nscd/17
 14409 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   2   3  31   0 sshd/1
   446 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   1   0   1   0 fmd/24
   196 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   1   0   1   0 nscd/13
     5 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
zpool-rpool/15
     5 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
zpool-rpool/14
     5 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
zpool-rpool/13
     5 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
zpool-rpool/12
     5 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
zpool-rpool/11
     5 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
zpool-rpool/10
     5 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
zpool-rpool/9
     5 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
zpool-rpool/8
Total: 57 processes, 260 lwps, load averages: 2.15, 2.16, 2.15
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG
PROCESS/LWPID
   604 root     0.0  33 0.0 0.0 0.0 0.0  42  25  18  13   0   0
zpool-data/13
   604 root     0.0  30 0.0 0.0 0.0 0.0  41  29  12  12   0   0
zpool-data/15
  1326 root      12 2.9 0.0 0.0 0.0 0.0  85 0.4  1K  12 11K   0 rsync/1
   604 root     0.0  15 0.0 0.0 0.0 0.0  41  44 111   9   0   0
zpool-data/27
   604 root     0.0  14 0.0 0.0 0.0 0.0  43  42  72   3   0   0
zpool-data/33
   604 root     0.0 5.9 0.0 0.0 0.0 0.0  41  53 109   6   0   0
zpool-data/19
   604 root     0.0 5.4 0.0 0.0 0.0 0.0  42  53 106   8   0   0
zpool-data/25
   604 root     0.0 5.3 0.0 0.0 0.0 0.0  43  51 107   7   0   0
zpool-data/21
   604 root     0.0 4.5 0.0 0.0 0.0 0.0  41  54 110   4   0   0
zpool-data/31
   604 root     0.0 3.9 0.0 0.0 0.0 0.0  41  55 109   3   0   0
zpool-data/23
   604 root     0.0 3.7 0.0 0.0 0.0 0.0  44  52 111   2   0   0
zpool-data/29
  1322 root     0.0 0.4 0.0 0.0 0.0 0.0  98 2.0  1K   0   1   0 rsync/1
 22644 root     0.0 0.2 0.0 0.0 0.0 0.0 100 0.0  16  13 255   0 prstat/1
 14409 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   5   3  69   0 sshd/1
   196 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0  15   2 105   0 nscd/17
Total: 57 processes, 260 lwps, load averages: 2.15, 2.16, 2.15
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG
PROCESS/LWPID
  1326 root      44  11 0.0 0.0 0.0 0.0  44 1.3  5K  24 42K   0 rsync/1
  1322 root     0.0 1.4 0.0 0.0 0.0 0.0  93 5.9  5K   0   0   0 rsync/1
 22644 root     0.0 0.2 0.0 0.0 0.0 0.0 100 0.0  16   0 255   0 prstat/1
   604 root     0.0 0.1 0.0 0.0 0.0 0.0 100 0.0   1   0   0   0 zpool-data/2
 14409 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0  14   1 133   0 sshd/1
   196 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0  15   0 105   0 nscd/17
   604 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0  62   0   0   0 zpool-data/3
   604 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0  28   0   0   0
zpool-data/39
   196 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   1   1   5   0 nscd/7
     5 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
zpool-rpool/15
     5 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
zpool-rpool/14
     5 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
zpool-rpool/13
     5 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
zpool-rpool/12
     5 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
zpool-rpool/11
     5 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
zpool-rpool/10
Total: 57 processes, 260 lwps, load averages: 2.15, 2.16, 2.15
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG
PROCESS/LWPID
  1326 root      47  11 0.0 0.0 0.0 0.0  40 1.5  6K  34 45K   0 rsync/1
  1322 root     0.0 1.4 0.0 0.0 0.0 0.0  93 5.8  6K   0   0   0 rsync/1
 22644 root     0.0 0.1 0.0 0.0 0.0 0.0 100 0.0  16   1 255   0 prstat/1
   196 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0  15   0 105   0 nscd/17
 14409 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   2   3  31   0 sshd/1
   178 root     0.0 0.0 0.0 0.0 0.0 100 0.0 0.0   1   0   2   0 devfsadm/3
     5 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
zpool-rpool/15
     5 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
zpool-rpool/14
     5 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
zpool-rpool/13
     5 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
zpool-rpool/12
     5 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
zpool-rpool/11
     5 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
zpool-rpool/10
     5 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
zpool-rpool/9
     5 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
zpool-rpool/8
     5 root     0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0
zpool-rpool/7
Total: 57 processes, 260 lwps, load averages: 2.15, 2.16, 2.15

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to