Public bug reported:

SRU Justification:

Impact:

Reduced TCP/IP receive performance for network devices that do not split
packet headers into skb linear area (e.g., mlx4).  The trusty kernel has
incorporated

commit eff44f9cc9a02aad53d568d3ae5020b6792ae4f6
Author: Jerry Chu <hk...@google.com>
Date:   Wed Dec 11 20:53:45 2013 -0800

    net-gro: Prepare GRO stack for the upcoming tunneling support

which modifies the GRO frag0 optimization, but unfortunately for some
cases results in calls to __skb_pull_tail for every packet being
received via the GRO path.  This causes a reduction in TCP receive
performance (or, more accurately, an increase in CPU load for TCP
receive processing, which will cause throughput reduction for CPU
limited workloads).

Fix:

This has already been fixed in mainline in

commit a50e233c50dbc881abaa0e4070789064e8d12d70
Author: Eric Dumazet <eduma...@google.com>
Date:   Sat Mar 29 21:28:21 2014 -0700

    net-gro: restore frag0 optimization

The fix has been backported to and verified on the trusty kernel using
mlx4 devices and iperf; an increase from 7.5 to 8.5 Gb/sec was observed
when adding the patch, and the relevant portion of perf captures show
changes in the call paths from:

     7.17%            iperf  [kernel.kallsyms]   [k] __pskb_pull_tail           
            
                      |
                      --- __pskb_pull_tail
                         |          
                         |--48.03%-- tcp_gro_receive
                         |          tcp4_gro_receive
                         |          inet_gro_receive
                         |          dev_gro_receive
                         |          napi_gro_frags
                         |          mlx4_en_process_rx_cq
                         |          mlx4_en_poll_rx_cq
                         |          net_rx_action
                         |          __do_softirq
[...]
                         |--28.53%-- napi_gro_frags
                         |          mlx4_en_process_rx_cq
                         |          mlx4_en_poll_rx_cq
                         |          net_rx_action
                         |          __do_softirq
[...]
                         |--13.11%-- inet_gro_receive
                         |          dev_gro_receive
                         |          napi_gro_frags
                         |          mlx4_en_process_rx_cq
                         |          mlx4_en_poll_rx_cq
                         |          net_rx_action
                         |          __do_softirq

to:

     4.87%          iperf  [kernel.kallsyms]   [k] skb_gro_receive              
          
                    |
                    --- skb_gro_receive
                       |          
                       |--98.13%-- tcp_gro_receive
                       |          tcp4_gro_receive
                       |          inet_gro_receive
                       |          dev_gro_receive
                       |          napi_gro_frags
                       |          mlx4_en_process_rx_cq
                       |          mlx4_en_poll_rx_cq
                       |          net_rx_action
                       |          __do_softirq

Testcase:

The fix was tested using mlx4 10Gb/sec network devices between two arm64
systems using "iperf -s" on one end and "iperf -c" on the other.  The
unmodified kernel reported approximately 7.5 Gb/sec throughput, the
fixed kernel approximately 8.5 Gb/sec.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1344323

Title:
  Trusty kernel network performance regression

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1344323/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to