Re: [vpp-dev] Packet loss on use of API & cmdline

Neale Ranns (nranns) Tue, 22 Aug 2017 06:35:30 -0700

Hi Colin,

Your comments were not taken as criticism ☺ constructive comments are always 
greatly appreciated.


Apart from the non-MP safe APIs Florin mentioned, and the route add/del cases I 
covered, the consensus is certainly that packet loss should not occur during a 
‘typical’ update and we will do what we can to address it.
Could you give us* some specific examples of the operations you do where you 
see packet loss?

Thanks,
Neale

*I say us not me as I’m about to hit the beach for a couple of weeks.


From: Colin Tregenza Dancer <c...@metaswitch.com>
Date: Tuesday, 22 August 2017 at 14:24
To: "Neale Ranns (nranns)" <nra...@cisco.com>, Florin Coras 
<fcoras.li...@gmail.com>
Cc: "vpp-dev@lists.fd.io" <vpp-dev@lists.fd.io>
Subject: RE: [vpp-dev] Packet loss on use of API & cmdline

Hi neale,

Thanks for the reply, and please don’t take my comments as a criticism of what 
I think is a great project.  I’m just trying to understand whether the packet 
loss I’m observing when I do thinks like add new tunnels, setup routes, etc, is 
generally viewed as acceptable, or whether it’s an area where there is an 
interest in changing.

Specifically I’m looking at a range of tunnel/gateway applications, and am 
finding that whilst static operation is great from a packet loss perspective, 
when I add/remove tunnels, routes, etc (something which in my application is to 
be expected on a regular basis) the existing flows undergo significant packet 
loss.  For comparison, with most hardware based router/gateway this doesn’t 
occur, and existing flows continue unaffected.

Cheers,

Colin.

From: Neale Ranns (nranns) [mailto:nra...@cisco.com]
Sent: 22 August 2017 13:44
To: Colin Tregenza Dancer <c...@metaswitch.com>; Florin Coras 
<fcoras.li...@gmail.com>
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Packet loss on use of API & cmdline

Hi Colin,

The instances of barrier syncs you have correctly identified, occur only in the 
exceptional cases of route addition/deletion and not in the typical case.

-          adj_last_lock_gone () is called when that adjacency is no longer 
required, i.e. we are removing the last route, or probably the ARP entry, for a 
neighbour we presumably no longer have
-          adj_nbr_update_rewrite_internal() is called when the adjacency 
transitions from incomplete (not associated MAC rewrite) to complete.
-          The fix for 892 occurs when a route is added that is the first to 
create a new edge/arc in the VLIB node graph. In the case of that JIRA ticket, 
it was the first recursive route. Edges are never removed, so this is a once 
per-reboot event.

But in the typical case of adding routes, e.g. a BGP/OSPF convergence event, 
the adjacencies are present and complete and the VLIB graph is already setup, 
so the routes will be added in a lock/barrier free manner.

Pre-building the VLIB graph of all possibilities is wasteful IMHO, and given 
the one-time only lock, an acceptable trade off.
Adjacencies are more complicated. The state of the adjacency, incomplete or 
complete, determines the VLIB node the packet should go to. So one needs to 
atomically change the state of the adjacency and the state of the routes that 
use it - hence the barrier. We could solve that with indirection, but it would 
be indirection in the data-path and that costs cycles. So, again, given the 
relatively rarity of such an adjacency state change, the trade-off was to 
barrier sync.

Hth,
neale


From: <vpp-dev-boun...@lists.fd.io<mailto:vpp-dev-boun...@lists.fd.io>> on 
behalf of Colin Tregenza Dancer via vpp-dev 
<vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>>
Reply-To: Colin Tregenza Dancer 
<c...@metaswitch.com<mailto:c...@metaswitch.com>>
Date: Tuesday, 22 August 2017 at 12:25
To: Florin Coras <fcoras.li...@gmail.com<mailto:fcoras.li...@gmail.com>>
Cc: "vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>" 
<vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>>
Subject: Re: [vpp-dev] Packet loss on use of API & cmdline

Hi Florin,

Thanks for the quick, and very useful reply.

I’d been looking at the mp_safe flags, and had concluded that I’d need the 
calls I was interested in to be at least marked mp_safe.

However, I was thinking that wasn’t sufficient, as it appeared that some calls 
marked as mp_safe invoke barrier_sync lower down the call stacks.  For instance 
the internal functions adj_last_lock_gone(),  adj_nbr_update_rewrite_internal() 
and vlib_node_serialize() all seem to call vlib_worker_thread_barrier_sync(), 
and the fix for defect 892 
https://jira.fd.io/browse/VPP-892?gerritReviewStatus=All#gerrit-reviews-left-panel
 involves adding barrier calls in code related to the mp_safe ADD_DEL_ROUTE 
(which fits with packet loss I’d observed during testing of deleting routes).

I think the raw lossless packet processing which vpp has achieved on static 
configs is truly amazing, but I guess what I’m trying to understand is whether 
it is viewed as important to achieve similar behaviour when the system is being 
reconfigured.  Personally I think many of the potential uses of a software 
dataplane include the need to do limited impact dynamic reconfiguration, 
however, maybe the kind of applications I have in mind are in a minority?

More than anything,  given the number of areas which would likely be touched by 
the required changes, I wanted to understand if there is a consensus that such 
change was even needed?

Thanks in advance for any insight you (or others) can offer.

Cheers,

Colin.



From: Florin Coras [mailto:fcoras.li...@gmail.com]
Sent: 22 August 2017 09:40
To: Colin Tregenza Dancer <c...@metaswitch.com<mailto:c...@metaswitch.com>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] Packet loss on use of API & cmdline

Hi Colin,

Your assumption was right. Most often than not, a binary API/CLI call results 
in a vlib_worker_thread_barrier_sync because most handlers and cli are not mp 
safe. As a consequence, vpp may experience packet loss.

One way around this issue, for binary APIs, is to make sure the handler you’re 
interested in is thread safe and then mark it is_mp_safe in api_main. See, for 
instance, VL_API_IP_ADD_DEL_ROUTE.

Hope this helps,
Florin

On Aug 22, 2017, at 1:11 AM, Colin Tregenza Dancer via vpp-dev 
<vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> wrote:

I might have just missed it, but looking through the ongoing regression tests I 
can’t see anything that explicitly tests for packet loss during CLI/API 
commands, so I’m wondering whether minimization of packet loss during 
configuration is viewed as a goal for vpp?

Many/most of the real world applications I’ve been exploring require the 
ability to reconfigure live systems without impacting the existing flows 
related to stable elements (route updates, tunnel add/remove, VM 
addition/removal), and it would be great to understand how this fit with vpp 
use cases.

Thanks again,

Colin.

From: vpp-dev-boun...@lists.fd.io<mailto:vpp-dev-boun...@lists.fd.io> 
[mailto:vpp-dev-boun...@lists.fd.io] On Behalf Of Colin Tregenza Dancer via 
vpp-dev
Sent: 19 August 2017 12:17
To: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: [vpp-dev] Packet loss on use of API & cmdline

Hi,

I’ve been doing some prototyping and load testing of the vpp dataplane, and 
have observed packet loss when I issue API requests or use the debug command 
line.  Is this to be expected given the use of the worker_thread_barrier, or 
might there be some way I could improve matters?

Currently I’m running a fairly modest 2Mpps throughput between a pair of 10G 
ports on an Intel X520 NIC, with baremetal Ubuntu 16, & vpp 17.01.

Thanks in advance,

Colin.
_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
https://lists.fd.io/mailman/listinfo/vpp-dev

_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] Packet loss on use of API & cmdline

Reply via email to