Re: [vpp-dev] vpp+dpdk #dpdk

2022-12-19 Thread zheng jie
Have never seen two net devices, even SRIOV devices share same PCI address. 
Will you dump your device via your lspci or or /sys/… , PCI bus addresses are 
always unique?

Personally I thought the PCI addresses in your screenshot are inaccurate.


From:  on behalf of "first_se...@163.com" 

Reply-To: "vpp-dev@lists.fd.io" 
Date: Monday, December 12, 2022 at 11:16 PM
To: "vpp-dev@lists.fd.io" 
Subject: [vpp-dev] vpp+dpdk #dpdk

I have the issue about when the same buf_info with different  name  of device 
like bottom picture,what should I do to bound two device called enp6s0f01d and 
enp6s0f02d. tks
[cid:image001.png@01D913B6.6A9781F0]

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22350): https://lists.fd.io/g/vpp-dev/message/22350
Mute This Topic: https://lists.fd.io/mt/95640416/21656
Mute #dpdk:https://lists.fd.io/g/vpp-dev/mutehashtag/dpdk
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] mellanox mlx5 + rdma + lcpng + bond - performance (tuning ? or just FIB/RIB processing limit) (max performance pps about 2Mpps when packet drops starts)

2022-12-19 Thread Matthew Smith via lists.fd.io
HI Pawel,

On Sat, Dec 17, 2022 at 6:28 PM Paweł Staszewski 
wrote:

> Hi
>
>
> So without bgp (lcp) and only one static route performance is basically as
> it is on "paper" 24Mpps/100Gbit/s without problem.
>
> And then no matter if with bond or without bond (with lcp) there are
> problems starting.
>

When you tried it without bgp, did you still use lcp to manage
interfaces/addresses and add the single static route? If not, could you try
that and report whether the problem still occurs?

How many routes is your BGP daemon adding to VPP's FIB?

Are you isolating the cores that worker threads are bound to (e.g. via
isolcpus kernel argument or via cpuset)?


>
> Basically side where im Receiving most traffic that need to be TX-ed to
> other interface is ok.
>
> Interface with most RX traffic on vlan 2906 has ip address and when I ping
> from this ip address to ptp ip other side - there is no packet drops.
>
> But on the interface where this traffic trat was RX-ed from this vlan2906
> - and need to be TX-ed on vlan 514 - there are drops to ptp ip of other
> side - from 10 to 20%
>
> Same is when I ping /mtr from RX-side to TX side there are drops - but
> there are no drops when I ping from TX side to RX side - so forwarding is
> done other side thru interface that has most RX - less TX
>

How are you measuring packet loss? You mentioned 10 to 20% drops by ping &
mtr above. Are those tools all you're using, or are you running some
traffic generator like T-rex? My reason for asking is that when I look in
the 'show runtime' output you sent at the number of packets ("vectors")
handled by rdma-input on each thread and compare it to the number of
packets handled by enp59s0f0-rdma-output and enp59s0f1-rdma-output in the
same thread, the difference is much much smaller than 10 - 20%. So some
more specific information on how you're measuring that there is 10-20%
packet loss would be useful. Is the 10-20% packet loss only observed when
communicating directly to the interface addresses on the host system or are
10-20% of packets which should be forwarded between interfaces by VPP being
dropped?

I notice you mentioned "lcpng", which is a customized version of linux-cp.
I'm not sure the differences between the stock versions of
linux-cp/linux-nl and the code in lcpng. Have you tried this experiment
with the stock version of linux-cp/linux-nl from the VPP master branch on
gerrit?

Also, as Ben previously requested, the output of 'show errors' and  'show
hardware-interfaces' would be helpful.

Thanks,
-Matt

So it looks like interface busy with RX-traffix is ok - problem is when
> interface is mostly TX-ing traffic RX-ed from other interface... but dont
> know how to check what is causing it ... ethtool -S for any interface is
> showing no errors/drops at interface lvl.
>
>
>
>
> On 12/16/22 10:51 AM, Benoit Ganne (bganne) via lists.fd.io wrote:
>
> Hi,
>
>
> So the hardware is:
> Intel 6246R
> 96GB ram
> Mellanox Connect-X 5 2x 100GB Ethernet NIC
> And simple configuration  with vpp/frr where one vlan interface all
> traffix is RX-ed and second vlan interface where this traffic is TX-ed -
> it is normal internet traffic - about 20Gbit/s with 2Mpps
>
> 2Mpps looks definitely too low, in a similar setup, CSIT measures IPv4 NDR 
> with rdma at ~17.6Mpps with 2 workers on 1 core (2 hyperthreads): 
> http://csit.fd.io/trending/#eNrlkk0OwiAQhU-DGzNJwdKuXFh7D0NhtE36QwBN6-mljXHahTt3LoCQb-Y95gUfBocXj-2RyYLlBRN5Y-LGDqd9PB7WguhBtyPwJLmhsFyPUmYKnOkUNDaFLK2Aa8BQz7e4KuUReuNmFXGeVcw9bCSJ2Hoi8t2IGpRDRR3RjVBAv7LZvoeqrk516JsnUmmcgLiOeRDieqsfJrui7yHzcqn4XXj2H8Kzn_BkuesH1y0_UJYvWG6xEg
>
> The output of 'sh err' and 'sh hard' would be useful too.
>
>
> Below vpp config:
>
> To start with, I'd recommend doing a simple test removing lcp, vlan & bond to 
> see if you can reproduce CSIT performance, and then maybe add bond and 
> finally lcp and vlan. This could help narrowing where performance drops.
>
>
> Below also show run
>
> The vector rate is really low, so it is really surprising there are drops...
> Do you capture the show run output when you're dropping packets? Basically, 
> when traffic is going through VPP and performance is maxing out, do 'cle run' 
> and then 'sh run' to see the instantaneous values and not averages.
>
>
> Anyone know how to interpret this data ? what are the Suspends for
> api-rx-from-ring ?
>
> This is a control plane task in charge of processing API messages. VPP uses 
> cooperative multitasking within the main thread for control plane tasks, 
> Suspends counts the number of times this specific task voluntarily released 
> the CPU, yielding to other tasks.
>
>
> and how to check what type of error(traffic) is doing drops:
>
> You can capture dropped traffic:
> pcap trace drop
> 
> pcap trace drop off
>
> You can also use VPP packet tracer:
> tr add rdma-input 1000
> 
> tr filter include error-drop 1000
> sh tr max 1000
>
> Best
> ben
>
>
>
> 
>
>

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to 

Re: [vpp-dev] Error message on starting vpp

2022-12-19 Thread Nathan Skrzypczak
Hi Xiaodong,

It seems like a '.runs_after' was introduced on a node that does not belong
to the arc in question. [0] should solve this.
(CCingJulian as he is the author of the original patch & sr plugin
maintainers)

Cheers
-Nathan

[0] https://gerrit.fd.io/r/c/vpp/+/37837

Le sam. 3 déc. 2022 à 19:29, Xiaodong Xu  a écrit :

> Hi VPP experts,
>
> I got the following error message when starting vpp recently:
>
> 0: vnet_feature_arc_init:272: feature node 'ip6-lookup' not found (before
> 'pt', arc 'ip6-output')
>
> I'm using master branch from VPP git repo. By checking the source code, it
> seems it might have something to do with the comment
> https://github.com/FDio/vpp/commit/b79d09bbfa93f0f752f7249ad27a08eae0863a6b
> and
> https://github.com/FDio/vpp/commit/39d6deca5f71ee4fe772c10d76ed5b65d1ebec44
>
> So I remove the two commits from my local repo and the issue is gone.
>
> The message seems to be harmless, but if anyone who is familiar with the
> commits can take a look, I'd appreciate it.
>
> Regards,
> Xiaodong
>
> 
>
>

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22348): https://lists.fd.io/g/vpp-dev/message/22348
Mute This Topic: https://lists.fd.io/mt/95432703/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] mellanox mlx5 + rdma + lcpng + bond - performance (tuning ? or just FIB/RIB processing limit) (max performance pps about 2Mpps when packet drops starts)

2022-12-19 Thread Benoit Ganne (bganne) via lists.fd.io
> Basically it looks like it is lcp problem and routing - dont know how this
> tests are done for lcp but it looks like those tests are like
> 1. load 900k routes
> 2. connect to traffic generator - and push 10Mpps (and dont check if You
> have any reply) or more from one single ip src to one single dst and
> viola... but it is not working like this :)

Not sure which tests you refer to, but we run different kind of tests in CSIT, 
the one I linked to earlier is the most basic one with a few routes and packets 
matching those routes - basically everything is in cache and is the fastest.
It's a good test to check whether there is a problem with the hardware etc.
But we also have other tests where we make sure we keep hitting routes outside 
of the cache too: 
http://csit.fd.io/trending/#eNrlkk0OwiAQhU-DGzNJwdKuXKi9h0EYbWN_CKBpPb20MU6b6M6dCyDkm3mPecGHzuHRY71lcs_yPRN5ZeLGNrt1PO7WgmhB1z3wJLmgsFz3UmYKnGkUVDaFLD0B14ChHG9xea1qFMkVXGugNW5UE4dRzdzCQpqILQci3w2pQTlU1BFdCQX0M5vP76Lqs1MN-uqB1BInIq5jPoS4XvqEwc7oa9i8mCp-H6b9pzDtO0xZrNrONdMPlcUTMTq7vg
This one uses 20K routes and random prefixes.

I expect things to be a bit slower because of bond + vlan (which both takes 
some processing), but it should not be more than 10%...
 
> So it looks like interface busy with RX-traffix is ok - problem is
> when interface is mostly TX-ing traffic RX-ed from other interface... but
> dont know how to check what is causing it ... ethtool -S for any interface
> is showing no errors/drops at interface lvl.

Is that between 2 different physical interfaces, in 2 different PCI slots? If 
so, are they connected to different NUMA nodes?
Can you share the output of 'sh err', 'sh hard', 'sh pci', 'sh thr'?

Best
ben

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#22347): https://lists.fd.io/g/vpp-dev/message/22347
Mute This Topic: https://lists.fd.io/mt/95697757/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-