On Mon, Aug 28, 2017 at 8:53 AM, Maciek Konstantynowicz (mkonstan) < mkons...@cisco.com> wrote:
> + csit-dev > > Billy, > > Per the last week CSIT project call, from CSIT perspective, we > classified your reported issue as Test coverage escape. > > Summary > ======= > CSIT test coverage got fixed, see more detail below. The CSIT tests > uncovered regression for L2BD with MAC learning with higher total number > of MACs in L2FIB, >>10k MAC, for multi-threaded configurations. Single- > threaded configurations seem to be not impacted. > > Billy, Karl, Can you confirm this aligns with your findings? > When you say "multi-threaded configuration", I assume you mean multiple worker threads? Karl's tests had 4 workers, one for each NIC (physical and vhost-user). He only tested multi-threaded, so we can not confirm that single-threaded configurations seem to be not impacted. Our numbers are a little different from yours, but we are both seeing drops between releases. We had a bigger drop off with 10k flows, but seems to be similar with the million flow tests. I was a little disappointed the MAC limit change by John Lo on 8/23 didn't improve master number some. Thanks for all the hard work and adding these additional test cases. Billy > More detail > =========== > MAC scale tests have been now added L2BD and L2BD+vhost CSIT suites, as > a simple extension to existing L2 testing suites. Some known issues with > TG prevented CSIT to add those tests in the past, but now as TG issues > have been addressed, the tests could be added swiftly. The complete list > of added tests is listed in [1] - thanks to Peter Mikus for great work > there! > > Results from running those tests multiple times within FD.io > <http://fd.io> CSIT lab > infra can be glanced over by checking dedicated test trigger commits > [2][3][4], summary graphs in linked xls [5]. The results confirm there > is regression in VPP l2fib code affecting all scaled up MAC tests in > multi-thread configuration. Single-thread configurations seems not be > impacted. > > The tests in commit [1] are not merged yet, as they're waiting for > TG/TRex team to fix TRex issue with mis-calculating Ethernet FCS with > large number of L2 MAC flows (>10k MAC flows). Issue is tracked by [6], > TRex v2.29 with the fix ETA is w/e 1-Sep i.e. this week. Reported CSIT test > results are using Ethernet frames with UDP headers that's masking the > TRex issue. > > We have also vpp git bisected the problem between v17.04 (good) and > v17.07 (bad) in a separate IXIA based lab in SJC, and found the culprit > vpp patch [7]. Awaiting fix from vpp-dev, jira ticket raised [8]. > > Many thanks for reporting this regression and working with CSIT to plug > this hole in testing. > > -Maciek > > [1] CSIT-786 L2FIB scale testing [https://gerrit.fd.io/r/#/c/8145/ > ge8145] [https://jira.fd.io/browse/CSIT-786 CSIT-786]; > L2FIB scale testing for 10k, 100k, 1M FIB entries > ./l2: > 10ge2p1x520-eth-l2bdscale10kmaclrn-ndrpdrdisc.robot > 10ge2p1x520-eth-l2bdscale100kmaclrn-ndrpdrdisc.robot > 10ge2p1x520-eth-l2bdscale1mmaclrn-ndrpdrdisc.robot > 10ge2p1x520-eth-l2bdscale10kmaclrn-eth-2vhostvr1024-1vm- > cfsrr1-ndrpdrdisc > 10ge2p1x520-eth-l2bdscale100kmaclrn-eth-2vhostvr1024-1vm- > cfsrr1-ndrpdrdisc > 10ge2p1x520-eth-l2bdscale1mmaclrn-eth-2vhostvr1024-1vm- > cfsrr1-ndrpdrdisc > [2] VPP master branch [https://gerrit.fd.io/r/#/c/8173/ ge8173]; > [3] VPP stable/1707 [https://gerrit.fd.io/r/#/c/8167/ ge8167]; > [4] VPP stable/1704 [https://gerrit.fd.io/r/#/c/8172/ ge8172]; > [5] CSIT-794 VPP v17.07 L2BD yields lower NDR and PDR performance vs. > v17.04, 20170825_l2fib_regression_10k_100k_1M.xlsx, [ > https://jira.fd.io/browse/CSIT-794 CSIT-794]; > [6] TRex v2.28 Ethernet FCS mis-calculation issue [ > https://jira.fd.io/browse/CSIT-793 CSIT-793]; > [7] commit 25ff2ea3a31e422094f6d91eab46222a29a77c4b; > [8] VPP v17.07 L2BD NDR and PDR multi-thread performance broken [ > https://jira.fd.io/browse/VPP-963 VPP-963]; > > On 14 Aug 2017, at 23:40, Billy McFall <bmcf...@redhat.com> wrote: > > In the last VPP call, I reported some internal Red Hat performance testing > was showing a significant drop in performance between releases 17.04 to > 17.07. This with l2-bridge testing - PVP - 0.002% Drop Rate: > VPP-17.04: 256 Flow 7.8 MP/s 10k Flow 7.3 MP/s 1m Flow 5.2 MP/s > VPP-17.07: 256 Flow 7.7 MP/s 10k Flow 2.7 MP/s 1m Flow 1.8 MP/s > > The performance team re-ran some of the tests for me with some additional > data collected. Looks like the size of the L2 FIB table was reduced in > 17.07. Below are the number of entries in the MAC Table after the tests are > run: > 17.04: > show l2fib > 4000008 l2fib entries > 17.07: > show l2fib > 1067053 l2fib entries with 1048576 learned (or non-static) entries > > This caused more packets to be flooded (see out of 'show node counters' > below). I looked but couldn't find anything. Is the size of the L2 FIB > Table table configurable? > > Thanks, > Billy McFall > > > 17.04: > > show node counters > Count Node Reason > : > 313035313 l2-input L2 input packets > 555726 l2-flood L2 flood packets > : > 310115490 l2-input L2 input packets > 824859 l2-flood L2 flood packets > : > 313508376 l2-input L2 input packets > 1041961 l2-flood L2 flood packets > : > 313691024 l2-input L2 input packets > 698968 l2-flood L2 flood packets > > 17.07: > > show node counters > Count Node Reason > : > 97810569 l2-input L2 input packets > 72557612 l2-flood L2 flood packets > : > 97830674 l2-input L2 input packets > 72478802 l2-flood L2 flood packets > : > 97714888 l2-input L2 input packets > 71655987 l2-flood L2 flood packets > : > 97710374 l2-input L2 input packets > 70058006 l2-flood L2 flood packets > > > -- > *Billy McFall* > SDN Group > Office of Technology > *Red Hat* > _______________________________________________ > vpp-dev mailing list > vpp-dev@lists.fd.io > https://lists.fd.io/mailman/listinfo/vpp-dev > > > -- *Billy McFall* SDN Group Office of Technology *Red Hat*
_______________________________________________ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev