Hi All, We are currently doing a zfs send/recv with mbuffer to send incremental changes across and it seems to be running quite slowly, with zfs receive the apparent bottle neck.
The process itself seems to be using almost 100% of a single CPU in "sys" time. Wondering if anyone has any ideas if this is normal or if this is just going to run forever and never finish... details - two machines connected via Gigabit Ethernet on the same LAN. Sending server: zfs send -i 20111201_1 data@20111205_1 | mbuffer -s 128k -m 1G -O tdp03r-int:9090 Receiving server: mbuffer -s 128k -m 1G -I 9090 | zfs receive -vF tank/db/data mbuffer showing: in @ 256 KiB/s, out @ 256 KiB/s, 306 GiB total, buffer 100% ful My debug: DTraceToolkit hotkernel reports: zfs`lzjb_decompress 10 0.0% unix`page_nextn 31 0.0% genunix`fsflush_do_pages 37 0.0% zfs`dbuf_free_range 183 0.1% genunix`list_next 5822 3.7% unix`mach_cpu_idle 150261 96.1% Top shows: PID USERNAME NLWP PRI NICE SIZE RES STATE TIME CPU COMMAND 22945 root 1 60 0 13M 3004K cpu/6 144:21 3.79% zfs 550 root 28 59 0 39M 22M sleep 10:19 0.06% fmd I'd say the 3.7% or so here is so low because we are providing not per CPU, but aggregate CPU usage. mpstat seems to show the real story. mpstat 1 shows output much like this each second: CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 0 329 108 83 0 17 3 0 0 0 0 0 100 1 0 0 0 100 1 94 0 23 1 0 0 0 0 0 100 2 0 0 0 32 0 28 0 5 1 0 0 0 0 0 100 3 0 0 0 18 0 11 0 0 0 0 0 0 0 0 100 4 0 0 0 16 6 10 0 2 0 0 0 0 0 0 100 5 0 0 0 6 0 2 0 0 0 0 0 0 0 0 100 6 0 0 0 2 0 0 0 0 0 0 0 0 0 0 100 7 0 0 0 9 0 4 0 0 0 0 16 0 0 0 100 8 0 0 0 6 0 3 0 0 0 0 0 0 3 0 97 9 0 0 0 3 1 0 0 0 0 0 0 0 0 0 100 10 0 0 0 22 2 35 0 1 1 0 0 0 89 0 11 11 0 0 0 2 0 0 0 0 0 0 0 0 0 0 100 12 0 0 0 3 0 2 0 1 0 0 2 0 0 0 100 13 0 0 0 2 0 0 0 0 0 0 0 0 0 0 100 14 0 0 0 24 17 6 0 0 2 0 61 0 0 0 100 15 0 0 0 14 0 24 0 0 1 0 2 0 0 0 100 16 0 0 0 2 0 0 0 0 0 0 0 0 0 0 100 17 0 0 0 10 2 8 0 0 5 0 78 0 1 0 99 18 0 0 0 2 0 0 0 0 0 0 0 0 0 0 100 19 0 0 0 5 1 2 0 0 0 0 10 0 0 0 100 20 0 0 0 2 0 0 0 0 0 0 0 0 0 0 100 21 0 0 0 9 2 4 0 0 0 0 4 0 0 0 100 22 0 0 0 4 0 0 0 0 0 0 0 0 0 0 100 23 0 0 0 2 0 0 0 0 0 0 0 0 0 0 100 So I'm lead to believe that zfs receive is spending almost 100% of a single CPUs time doing a lot of genunix`list_next ... Any ideas what is going on here? Best Regards, -- Lachlan Mulcahy Senior DBA, Marin Software Inc. San Francisco, USA AU Mobile: +61 458 448 721 US Mobile: +1 (415) 867 2839 Office : +1 (415) 671 6080
_______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss