On 05/31/2010 02:32 PM, Sandon Van Ness wrote:
> well it seems like when messing with the txg sync times and stuff like
> that it did make the transfer more smooth but didn't actually help with
> speeds as it just meant the hangs happened for a shorter time but at a
> smaller interval and actually lowering the time between writes just
> seemed to make things worse (slightly).
>
> I think I have came to the conclusion that the problem here is CPU due
> to the fact that its only doing this with parity raid. I would think if
> it was I/O based then it would be the same as if anything its heavier on
> I/O on non parity raid due to the fact that it is no longer CPU
> bottlenecked (dd write test gives me near 700 megabytes/sec vs 450 with
> parity raidz2).
>
> So if I am understnading things the issueI am seeing should be fixed but
> aparrantly its not (in my case) as CPU usage from parity/zfs
> calculations are takin g precidence over my process doing the writting
> (rsync)?
>
> I think I have near 100% come to the conclusion that the issue is CPU
> based due the fact I saw the same dips even when using mbuffer

And here is some top output the slowdowns occur when zfs-pool starts
using cpu and rsnc gets CPU starved:

Normal activity shows:

last pid: 22635;  load avg:  2.17,  2.18,  2.16;  up
0+18:04:42                                                                      
                                                                                
                     
14:53:29
59 processes: 57 sleeping, 1 running, 1 on cpu
CPU states: 54.7% idle, 23.4% user, 21.9% kernel,  0.0% iowait,  0.0% swap
Kernel: 37646 ctxsw, 193 trap, 20914 intr, 45295 syscall
Memory: 4027M phys mem, 190M free mem, 2013M total swap, 2013M free swap

   PID USERNAME NLWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
  1326 root        1  59  -20  383M   44M run    496:47 28.87% rsync
  1322 root        1  59  -20  383M  357M sleep   11:21  0.70% rsync
     3 root        1  60  -20    0K    0K sleep    1:24  0.06% fsflush

when starved:

last pid: 22636;  load avg:  2.16,  2.18,  2.16;  up
0+18:05:16                                                                      
                                                                                
                     
14:54:03
59 processes: 57 sleeping, 2 on cpu
CPU states: 24.9% idle, 10.5% user, 64.6% kernel,  0.0% iowait,  0.0% swap
Kernel: 17855 ctxsw, 18 trap, 12831 intr, 21090 syscall
Memory: 4027M phys mem, 198M free mem, 2013M total swap, 2013M free swap

   PID USERNAME NLWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
   604 root       39  99  -20    0K    0K cpu/0  316:55 53.36% zpool-data
  1326 root        1  59  -20  383M   44M sleep  497:03 13.49% rsync
  1322 root        1  59  -20  383M  357M sleep   11:21  0.33% rsync
 22635 root        1  59    0 3852K 1912K cpu/1    0:00  0.06% top
     3 root        1  60  -20    0K    0K sleep    1:24  0.06% fsflush

The stall actually happens less than a second but the solaris version of
top doesn't seem to be able to take <1 values (other than 0) when using
-s like you can on linux (-d .5) otherwise I think the zpool-data would
be near 100% cpu during the stall.
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to