Hi All,

We are currently doing a zfs send/recv with mbuffer to send incremental
changes across and it seems to be running quite slowly, with zfs receive
the apparent bottle neck.

The process itself seems to be using almost 100% of a single CPU in "sys"
time.

Wondering if anyone has any ideas if this is normal or if this is just
going to run forever and never finish...


details - two machines connected via Gigabit Ethernet on the same LAN.

Sending server:

zfs send -i 20111201_1 data@20111205_1 | mbuffer -s 128k -m 1G -O
tdp03r-int:9090

Receiving server:

mbuffer -s 128k -m 1G -I 9090 | zfs receive -vF tank/db/data

mbuffer showing:

in @  256 KiB/s, out @  256 KiB/s,  306 GiB total, buffer 100% ful



My debug:

DTraceToolkit hotkernel reports:

zfs`lzjb_decompress                                        10   0.0%
unix`page_nextn                                            31   0.0%
genunix`fsflush_do_pages                                   37   0.0%
zfs`dbuf_free_range                                       183   0.1%
genunix`list_next                                        5822   3.7%
unix`mach_cpu_idle                                     150261  96.1%


Top shows:

   PID USERNAME NLWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
 22945 root        1  60    0   13M 3004K cpu/6  144:21  3.79% zfs
   550 root       28  59    0   39M   22M sleep   10:19  0.06% fmd

I'd say the 3.7% or so here is so low because we are providing not per CPU,
but aggregate CPU usage. mpstat seems to show the real story.

mpstat 1 shows output much like this each second:

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    0   329  108   83    0   17    3    0     0    0   0   0 100
  1    0   0    0   100    1   94    0   23    1    0     0    0   0   0 100
  2    0   0    0    32    0   28    0    5    1    0     0    0   0   0 100
  3    0   0    0    18    0   11    0    0    0    0     0    0   0   0 100
  4    0   0    0    16    6   10    0    2    0    0     0    0   0   0 100
  5    0   0    0     6    0    2    0    0    0    0     0    0   0   0 100
  6    0   0    0     2    0    0    0    0    0    0     0    0   0   0 100
  7    0   0    0     9    0    4    0    0    0    0    16    0   0   0 100
  8    0   0    0     6    0    3    0    0    0    0     0    0   3   0  97
  9    0   0    0     3    1    0    0    0    0    0     0    0   0   0 100
 10    0   0    0    22    2   35    0    1    1    0     0    0  89   0  11
 11    0   0    0     2    0    0    0    0    0    0     0    0   0   0 100
 12    0   0    0     3    0    2    0    1    0    0     2    0   0   0 100
 13    0   0    0     2    0    0    0    0    0    0     0    0   0   0 100
 14    0   0    0    24   17    6    0    0    2    0    61    0   0   0 100
 15    0   0    0    14    0   24    0    0    1    0     2    0   0   0 100
 16    0   0    0     2    0    0    0    0    0    0     0    0   0   0 100
 17    0   0    0    10    2    8    0    0    5    0    78    0   1   0  99
 18    0   0    0     2    0    0    0    0    0    0     0    0   0   0 100
 19    0   0    0     5    1    2    0    0    0    0    10    0   0   0 100
 20    0   0    0     2    0    0    0    0    0    0     0    0   0   0 100
 21    0   0    0     9    2    4    0    0    0    0     4    0   0   0 100
 22    0   0    0     4    0    0    0    0    0    0     0    0   0   0 100
 23    0   0    0     2    0    0    0    0    0    0     0    0   0   0 100


So I'm lead to believe that zfs receive is spending almost 100% of a single
CPUs time doing a lot of genunix`list_next ...

Any ideas what is going on here?

Best Regards,
-- 
Lachlan Mulcahy
Senior DBA,
Marin Software Inc.
San Francisco, USA

AU Mobile: +61 458 448 721
US Mobile: +1 (415) 867 2839
Office : +1 (415) 671 6080
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to