On Fri, Jun 26, 2015 at 04:59:32AM +0300, Sergey Ryazanov wrote:
> Hello,
> 
> during building l2tp tunnel with xl2tpd-1.3.1 I was faced with its too
> low  upload  performance.  When  download,  the  speed is 20 mbit/s at
> nearly  100% CPU utilization. CPU is Pentium D 930 3 GHz. When upload,
> the speed is below 2 mbit/s at nearly zero CPU utilization.
> 
> First,  I  examined xl2tpd code and did not find any potential issues.
> Then  I  compiled  it with -pg option and do a quick test with help of
> iperf(1):  4  TCP  flows,  direction  is toward the L2TP server, 2 min
> test.   Then  I run gprof and got pretty strange output:
> 
> Full output: http://pastebin.com/dm4wz6VK
> granularity: each sample hit covers 4 byte(s) for 11.11% of 0.09 seconds
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>  77.8       0.07     0.07    24311     0.00     0.00  write [4]
>  22.2       0.09     0.02    35897     0.00     0.00  _thread_sys_sendmsg [7]
>   0.0       0.09     0.00   109136     0.00     0.00  memset [46]
>   0.0       0.09     0.00   107997     0.00     0.00  ___errno [926]
>   0.0       0.09     0.00    71740     0.00     0.00  read_packet [47]
>   0.0       0.09     0.00    71726     0.00     0.00  _thread_sys_read [927]
>   0.0       0.09     0.00    60154     0.00     0.00  
> _thread_sys_gettimeofday [928]
>   0.0       0.09     0.00    60114     0.00     0.00  process_signal [32]
>   0.0       0.09     0.00    60113     0.00     0.00  build_fdset [29]
>   0.0       0.09     0.00    60113     0.00     0.00  process_schedule [26]
>   0.0       0.09     0.00    60113     0.00     0.00  select [48]
>   0.0       0.09     0.00    35897     0.00     0.00  udp_xmit [6]
> 
> Another one run also gives something strange:
> 
> Full output: http://pastebin.com/vXvHPuyj
> granularity: each sample hit covers 4 byte(s) for 14.29% of 0.07 seconds
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>  71.4       0.05     0.05    24348     0.00     0.00  write [4]
>  14.3       0.06     0.01    60265     0.00     0.00  select [6]
>  14.3       0.07     0.01       10     1.00     1.00  calloc [7]
>   0.0       0.07     0.00   109324     0.00     0.00  memset [55]
>   0.0       0.07     0.00   108285     0.00     0.00  ___errno [925]
>   0.0       0.07     0.00    71997     0.00     0.00  read_packet [56]
>   0.0       0.07     0.00    71980     0.00     0.00  _thread_sys_read [926]
>   0.0       0.07     0.00    60282     0.00     0.00  
> _thread_sys_gettimeofday [927]
>   0.0       0.07     0.00    60266     0.00     0.00  process_signal [50]
>   0.0       0.07     0.00    60265     0.00     0.00  build_fdset [49]
>   0.0       0.07     0.00    60265     0.00     0.00  process_schedule [57]
>   0.0       0.07     0.00    36020     0.00     0.00  _thread_sys_sendmsg 
> [928]
>   0.0       0.07     0.00    36020     0.00     0.00  udp_xmit [58]
> 
> For comparison I changed direction and did a download test:
> 
> Full output: http://pastebin.com/FFF6xGcy
> granularity: each sample hit covers 4 byte(s) for 0.01% of 114.30 seconds
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>  93.7     107.06   107.06   370804     0.29     0.29  write [5]
>   1.8     109.08     2.02   370770     0.01     0.01  recvmsg [6]
>   1.4     110.67     1.59   370799     0.00     0.00  
> _thread_sys_gettimeofday [8]
>   1.1     111.90     1.23   370770     0.00     0.30  handle_packet [4]
>   1.0     113.01     1.11   370761     0.00     0.00  add_fcs [9]
>   0.7     113.81     0.81   370781     0.00     0.00  select [10]
>   0.2     114.03     0.21        1   210.00 114172.55  network_thread [3]
>   0.1     114.14     0.12                             mcount [11]
>   0.0     114.19     0.05   370781     0.00     0.00  build_fdset [12]
>   0.0     114.23     0.04   370771     0.00     0.00  get_call [13]
>   0.0     114.27     0.03  1112701     0.00     0.00  memset [14]
>   0.0     114.28     0.02   370781     0.00     0.00  process_schedule [7]
>   0.0     114.30     0.01       29     0.34     0.34  _thread_sys_sendmsg [20]
>   0.0     114.30     0.01        8     0.62     0.74  inet_ntoa [24]
>   0.0     114.30     0.00   370785     0.00     0.00  swaps [110]
>   0.0     114.30     0.00   370782     0.00     0.00  process_signal [41]
>   0.0     114.30     0.00   370770     0.00     0.00  recycle_buf [111]
> 
> During  upload  tests, everything looks like if xl2tpd doesn't perform
> any work and stucks somewhere in I/O operation.
> 
> May  be  there are some options, what could be tuned to speedup ppp(4)
> I/O  performance  or  did  I missed something during my tests? I am in
> doubts. Any clues?
> 

Can you get a ktrace output to figure out what write is doing?
Could it be that it busy loops with EINTR or EAGAIN?
It sure smells like something is going on there.

-- 
:wq Claudio

Reply via email to