Re: [vpp-dev] VPP Performance drop from 17.04 to 17.07

Maciek Konstantynowicz (mkonstan) Mon, 28 Aug 2017 10:11:45 -0700

On 28 Aug 2017, at 17:47, Billy McFall 
<bmcf...@redhat.com<mailto:bmcf...@redhat.com>> wrote:




On Mon, Aug 28, 2017 at 8:53 AM, Maciek Konstantynowicz (mkonstan) 
<mkons...@cisco.com<mailto:mkons...@cisco.com>> wrote:
+ csit-dev

Billy,

Per the last week CSIT project call, from CSIT perspective, we
classified your reported issue as Test coverage escape.

Summary
=======
CSIT test coverage got fixed, see more detail below. The CSIT tests
uncovered regression for L2BD with MAC learning with higher total number
of MACs in L2FIB, >>10k MAC, for multi-threaded configurations. Single-
threaded configurations seem to be not impacted.

Billy, Karl, Can you confirm this aligns with your findings?

When you say "multi-threaded configuration", I assume you mean multiple worker 
threads?

Yes, I should have said multiple data plane threads, in VPP land that’s worker 
threads indeed.

Karl's tests had 4 workers, one for each NIC (physical and vhost-user). He only 
tested multi-threaded, so we can not confirm that single-threaded 
configurations seem to be not impacted.

Okay. Still your result align with our tests, both CSIT and offline with IXIA.


Our numbers are a little different from yours, but we are both seeing drops 
between releases.

Your numbers are different most likely due to different MAC scale. You
quote MAC scale per direction, we quote total MAC scale, i.e. total
number of VPP l2fib entries.

We had a bigger drop off with 10k flows, but seems to be similar with the 
million flow tests.

Our 10k flows is equivalent of 2* 5k flows, defined as:

    flow-ab1 => (smac-a1,dmac-b1)
    flow-ab2 => (smac-a2,dmac-b2)
    ..
    flow-ab5000 => (smac-a5000,dmac-b5000)

    flow-ba1 => (smac-b1,dmac-a1)
    flow-ba2 => (smac-b2,dmac-a2)
    ..
    flow-ba5000 => (smac-b5000,dmac-a5000)

In your case, based on description provided by Karl on the last CSIT
call I read 10k flows tests has 2*10k flows, defined as:

    flow-ab1 => (smac-a1,dmac-b1)
    flow-ab2 => (smac-a2,dmac-b2)
    ..
    flow-ab10000 => (smac-a10000,dmac-b10000)

    flow-ba1 => (smac-b1,dmac-a1)
    flow-ba2 => (smac-b2,dmac-a2)
    ..
    flow-ba10000 => (smac-b10000,dmac-a10000)

Also, your PDR packet loss tolerance at  0.002% Drop Rate is different
than CSIT PDR (0.5% pkt loss rate tolerance) and NDR (zero pkt loss rate
tolerance).


I was a little disappointed the MAC limit change by John Lo on 8/23 didn't 
improve master number some.

Thanks for all the hard work and adding these additional test cases.

You are welcome. Thanks again for reporting this regression.
Let’s wait for vpp-dev fix, so that we retest verify the fix.

-Maciek


Billy


More detail
===========
MAC scale tests have been now added L2BD and L2BD+vhost CSIT suites, as
a simple extension to existing L2 testing suites. Some known issues with
TG prevented CSIT to add those tests in the past, but now as TG issues
have been addressed, the tests could be added swiftly. The complete list
of added tests is listed in [1] - thanks to Peter Mikus for great work
there!

Results from running those tests multiple times within FD.io<http://fd.io/> 
CSIT lab
infra can be glanced over by checking dedicated test trigger commits
[2][3][4], summary graphs in linked xls [5]. The results confirm there
is regression in VPP l2fib code affecting all scaled up MAC tests in
multi-thread configuration. Single-thread configurations seems not be
impacted.

The tests in commit [1] are not merged yet, as they're waiting for
TG/TRex team to fix TRex issue with mis-calculating Ethernet FCS with
large number of L2 MAC flows (>10k MAC flows). Issue is tracked by [6],
TRex v2.29 with the fix ETA is w/e 1-Sep i.e. this week. Reported CSIT test
results are using Ethernet frames with UDP headers that's masking the
TRex issue.

We have also vpp git bisected the problem between v17.04 (good) and
v17.07 (bad) in a separate IXIA based lab in SJC, and found the culprit
vpp patch [7]. Awaiting fix from vpp-dev, jira ticket raised [8].

Many thanks for reporting this regression and working with CSIT to plug
this hole in testing.

-Maciek

[1] CSIT-786 L2FIB scale testing [https://gerrit.fd.io/r/#/c/8145/ ge8145] 
[https://jira.fd.io/browse/CSIT-786 
CSIT-786<https://gerrit.fd.io/r/#/c/8145/%20ge8145]%20[https://jira.fd.io/browse/CSIT-786%20CSIT-786>];
    L2FIB scale testing for 10k, 100k, 1M FIB entries
     ./l2:
     10ge2p1x520-eth-l2bdscale10kmaclrn-ndrpdrdisc.robot
     10ge2p1x520-eth-l2bdscale100kmaclrn-ndrpdrdisc.robot
     10ge2p1x520-eth-l2bdscale1mmaclrn-ndrpdrdisc.robot
     10ge2p1x520-eth-l2bdscale10kmaclrn-eth-2vhostvr1024-1vm-cfsrr1-ndrpdrdisc
     10ge2p1x520-eth-l2bdscale100kmaclrn-eth-2vhostvr1024-1vm-cfsrr1-ndrpdrdisc
     10ge2p1x520-eth-l2bdscale1mmaclrn-eth-2vhostvr1024-1vm-cfsrr1-ndrpdrdisc
[2] VPP master branch [https://gerrit.fd.io/r/#/c/8173/ 
ge8173<https://gerrit.fd.io/r/#/c/8173/%20ge8173>];
[3] VPP stable/1707 [https://gerrit.fd.io/r/#/c/8167/ 
ge8167<https://gerrit.fd.io/r/#/c/8167/%20ge8167>];
[4] VPP stable/1704 [https://gerrit.fd.io/r/#/c/8172/ 
ge8172<https://gerrit.fd.io/r/#/c/8172/%20ge8172>];
[5] CSIT-794 VPP v17.07 L2BD yields lower NDR and PDR performance vs. v17.04, 
20170825_l2fib_regression_10k_100k_1M.xlsx, [https://jira.fd.io/browse/CSIT-794 
CSIT-794<https://jira.fd.io/browse/CSIT-794%20CSIT-794>];
[6] TRex v2.28 Ethernet FCS mis-calculation issue 
[https://jira.fd.io/browse/CSIT-793 
CSIT-793<https://jira.fd.io/browse/CSIT-793%20CSIT-793>];
[7] commit 25ff2ea3a31e422094f6d91eab46222a29a77c4b;
[8] VPP v17.07 L2BD NDR and PDR multi-thread performance broken 
[https://jira.fd.io/browse/VPP-963 
VPP-963<https://jira.fd.io/browse/VPP-963%20VPP-963>];

On 14 Aug 2017, at 23:40, Billy McFall 
<bmcf...@redhat.com<mailto:bmcf...@redhat.com>> wrote:

In the last VPP call, I reported some internal Red Hat performance testing was 
showing a significant drop in performance between releases 17.04 to 17.07. This 
with l2-bridge testing - PVP - 0.002% Drop Rate:
   VPP-17.04: 256 Flow 7.8 MP/s 10k Flow 7.3 MP/s 1m Flow 5.2 MP/s
   VPP-17.07: 256 Flow 7.7 MP/s 10k Flow 2.7 MP/s 1m Flow 1.8 MP/s

The performance team re-ran some of the tests for me with some additional data 
collected. Looks like the size of the L2 FIB table was reduced in 17.07. Below 
are the number of entries in the MAC Table after the tests are run:
   17.04:
     show l2fib
     4000008 l2fib entries
   17.07:
     show l2fib
     1067053 l2fib entries with 1048576 learned (or non-static) entries

This caused more packets to be flooded (see out of 'show node counters' below). 
I looked but couldn't find anything. Is the size of the L2 FIB Table table 
configurable?

Thanks,
Billy McFall


17.04:

show node counters
   Count                    Node                  Reason
:
 313035313                l2-input                L2 input packets
    555726                l2-flood                L2 flood packets
:
 310115490                l2-input                L2 input packets
    824859                l2-flood                L2 flood packets
:
 313508376                l2-input                L2 input packets
   1041961                l2-flood                L2 flood packets
:
 313691024                l2-input                L2 input packets
    698968                l2-flood                L2 flood packets

17.07:

show node counters
   Count                    Node                  Reason
:
  97810569                l2-input                L2 input packets
  72557612                l2-flood                L2 flood packets
:
  97830674                l2-input                L2 input packets
  72478802                l2-flood                L2 flood packets
:
  97714888                l2-input                L2 input packets
  71655987                l2-flood                L2 flood packets
:
  97710374                l2-input                L2 input packets
  70058006                l2-flood                L2 flood packets


--
Billy McFall
SDN Group
Office of Technology
Red Hat
_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
https://lists.fd.io/mailman/listinfo/vpp-dev




--
Billy McFall
SDN Group
Office of Technology
Red Hat

_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] VPP Performance drop from 17.04 to 17.07

Reply via email to