[vpp-dev] VPP and Core allocation
Hello, We are integrating VPP into our product. One question we had was about the recommendation for usage of cores. When using DPDK, the recommendation is to allocate it one physical core. When using HyperThreaded cores, for DPDK one core is allocated and sibling core is not allocated to anyone. For VPP do we have any such recommendation? Should we assign VPP to a specific vcore and not assign the sibling core to any other thread or process? Regards, Prashanth This e-mail message may contain confidential or proprietary information of Mavenir Systems, Inc. or its affiliates and is intended solely for the use of the intended recipient(s). If you are not the intended recipient of this message, you are hereby notified that any review, use or distribution of this information is absolutely prohibited and we request that you delete all copies in your control and contact us by e-mailing to secur...@mavenir.com. This message contains the views of its author and may not necessarily reflect the views of Mavenir Systems, Inc. or its affiliates, who employ systems to monitor email messages, but make no representation that such messages are authorized, secure, uncompromised, or free from computer viruses, malware, or other defects. Thank You -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#16091): https://lists.fd.io/g/vpp-dev/message/16091 Mute This Topic: https://lists.fd.io/mt/73066645/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
[vpp-dev] FD.io Maintenance: Tuesday April 28, 2020 1800 UTC to 2200 UTC
**When: *Tuesday April 28, 2020 1800 UTC to 2200 UTC* * * *What: *Linux foundation upgrades, OS updates, and security patches * Jenkins o Upgrade to 2.222.x o OS and security updates o Plugin updates * Nexus o Upgrade to 2.14.17-01 o OS and security updates * Jira o Upgrade to 8.6.1 o OS and security updates * Gerrit o Upgrade to 3.1.x o OS and security updates * OpenGrok o OS and security updates *Impact:* Gerrit will be unavailable for all or part of the window as we do the following: * Convert the existing database to NoteDB * Upgrade Gerrit to 3.1.x * Rebuild indexes The standard upgrades and maintenance will require a reboot of each FD.io system. Jenkins will be placed in shutdown mode at 1700 UTC. Running jobs will be aborted at 1800 UTC, please let us know if that will cause a problem for your project. The following systems will be unavailable during the maintenance window: * Jenkins sandbox * Jenkins production * Nexus * Jira * Gerrit * OpenGrok -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#16090): https://lists.fd.io/g/vpp-dev/message/16090 Mute This Topic: https://lists.fd.io/mt/73064712/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [vpp-dev] Deadlock between NAT threads when frame queues for handoff are congested
Hi Ole! Thanks, here is a change doing that, please have a look: https://gerrit.fd.io/r/c/vpp/+/26544 With this change, an assertion will fail if the number of threads is greater than 55 or something like that. To make things work for such large thread counts it would be necessary to increase the queue size also, this change does not handle that. Best regards, Elias On Thu, 2020-04-16 at 13:43 +0200, Ole Troan wrote: > Hi Elias, > > Thank you for the thorough analysis. > I think the best approach for now is the one you propose. Reserve as > many slots as you have workers. > Potentially also increase the queue size > 64. > > Damjan is looking at some further improvements in this space, but for > now please go with what you propose. > > Best regards, > Ole -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#16089): https://lists.fd.io/g/vpp-dev/message/16089 Mute This Topic: https://lists.fd.io/mt/73030838/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [vpp-dev] Deadlock between NAT threads when frame queues for handoff are congested
Hi Elias, Thank you for the thorough analysis. I think the best approach for now is the one you propose. Reserve as many slots as you have workers. Potentially also increase the queue size > 64. Damjan is looking at some further improvements in this space, but for now please go with what you propose. Best regards, Ole > On 15 Apr 2020, at 14:26, Elias Rudberg wrote: > > Hello VPP experts, > > We are using VPP for NAT44 and last week we encountered a problem where > some VPP threads stopped forwarding traffic. We saw the problem on two > separate VPP servers within a short time, apparently it was triggered > by some specific kind of out2in traffic that arrived at that time. > > As far as I can tell, this issue exists in both the current master > branch and in the 1908 and 2001 branches. > > After investigating and finally being able to reproduce the problem in > a lab setting, we came to the following conclusion about what happened: > > The scenario where this happens is that several threads (8 threads in > our case) are used for NAT and the frame queues for handoff between > threads are being congested for some of the threads. This can be > triggered for example by "garbage" out2in traffic that comes in at some > port, if much of the out2in traffic has the same destination port then > much of the traffic will be handed off to the same thread, since the > out2in handoff thread index is decided based on the dest port. It > doesn't matter if the traffic belongs to any existing NAT sessions or > not, since handoff must be done before checking that and the problem is > related to the handoff. > > When a frame queue is congested, that is supposed to be detected by the > is_vlib_frame_queue_congested() call in > vlib_buffer_enqueue_to_thread(). However, that check is not completely > reliable since other threads may add things to the queue after the > check. For example, it can happen that two threads call > is_vlib_frame_queue_congested() simultaneously and both come to the > conclusion that the queue is not congested when in fact it will be > congested when one of them has added to the queue giving trouble for > the other thread. This problem is to some extent mitigated by the fact > that the check in is_vlib_frame_queue_congested() uses a > "queue_hi_thresh" value that is set slightly lower than the number of > elements in the queue, it is set like this: > > fqm->queue_hi_thresh = frame_queue_nelts - 2; > > The -2 there means that things are still OK if two threads call > is_vlib_frame_queue_congested() simultaneously, but if three or four > threads do it simultaneously we are anyway in trouble, and that seems > to be what happened on our VPP servers last week. This leads to one or > more threads being stuck in an infinite loop, in the loop that looks > like this in vlib_get_frame_queue_elt(): > > /* Wait until a ring slot is available */ > while (new_tail >= fq->head_hint + fq->nelts) >vlib_worker_thread_barrier_check (); > > The loop above is supposed to end when a different thread changes the > value of the volatile variable fq->head_hint but that will not happen > if the other thread is also stuck in this loop. We get a deadlock, A is > waiting for B and B is waiting for A. In the context of NAT, thread A > wants to handoff something to thread B at the same time as thread B > wants to handoff something to thread A, while at the same time their > frame queues are congested. This leads to those two threads being stuck > in the loop forever, each of them waiting for the other one. > > To me it looks like the subtraction by 2 when setting queue_hi_thresh > is just an ad hoc choice, there is no reason why 2 would be enough. I > think that to make it safe, we need to subtract the number of threads. > Essentially, we need to ensure that there is room for each thread to > reserve one extra element in the queue so that no thread can get stuck > waiting in the loop above. I tested this by hard-coding -8 instead of > -2 and then the problem cannot be reproduced anymore, so that fix seems > to work. The frame_queue_nelts value is 64 so using -8 means that the > queue is considered congested already 56 instead of 62 as it is now. > > What do you think, is it a good solution to check the number of threads > and use that to set "fqm->queue_hi_thresh = frame_queue_nelts - > n_threads;"? > > Best regards, > Elias > -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#16088): https://lists.fd.io/g/vpp-dev/message/16088 Mute This Topic: https://lists.fd.io/mt/73030838/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
[vpp-dev] Setting IPFIX collector port does not work as expected
Hi all I've noticed that if port of collector in IPFIX configuration is set to different value than the default is, e.g. instead of 4739 which is the default, then packets with template will arrive at the configured port, but packets with data will still go to port 4739. Bug in Jira with more details: https://jira.fd.io/browse/VPP-1859 (https://link.getmailspring.com/link/519e1ba4-7fc7-4c2a-adc4-55e34bf0b...@getmailspring.com/0?redirect=https%3A%2F%2Fjira.fd.io%2Fbrowse%2FVPP-1859=dnBwLWRldkBsaXN0cy5mZC5pbw%3D%3D) Thanks, Andrii -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#16087): https://lists.fd.io/g/vpp-dev/message/16087 Mute This Topic: https://lists.fd.io/mt/73052145/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-
Re: [vpp-dev] frp_preference and frp_weight size #vnet
Hi Dimitar, In VPP’s FIB weight and preference are attributes of a path not of a route. The weight controls [un]equal cost load-balancing across the paths and preference controls which paths to use when they are [un]available (i.e. BFD down), a kind of poor man’s fast re-route. It’s my understanding that Linux preference/metric applies to the route and is set based on the admin distance of the route’s originator (e.g. static, OSPF, etc). The route with the best metric/AD is the one used for forwarding, this is route arbitration performed by a RIB, only the route with the best metric/AD should be programmed into a forwarding table and hence programmed into VPP’s FIB. /neale From: on behalf of Dimitar Ivanov Date: Wednesday 15 April 2020 at 14:52 To: "vpp-dev@lists.fd.io" Subject: [vpp-dev] frp_preference and frp_weight size #vnet Hi, I'm working with vers.19.08 and see something that cannot understand. Currently in FIB route path definition preference and weight are u8. In same time preference(metric) in Linux kernel is 32 bit value. My target is to inject routes in VPP and to keep same parameters like in Linux kernel. So here i face the conflict. And this is most visible when examine IPv6 routing table in Linux, where default metric/preference of default route is 1024. Question is how to overcome this problem where is possible Linux kernel to have routes with metric bigger than 255 and in same time VPP Fib do not allow this ? I saw that with this patch https://gerrit.fd.io/r/c/vpp/+/7586 is introduced preference and weight become from u32 to u16. Now weight and preference are u8. Is there some plans VPP Fib to accepts preference bigger than u8 ? Is there some workaround for this problem ? Regards, Dimitar -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#16086): https://lists.fd.io/g/vpp-dev/message/16086 Mute This Topic: https://lists.fd.io/mt/73031162/21656 Mute #vnet: https://lists.fd.io/mk?hashtag=vnet=1480452 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-