Hi Neale,

I agree that something like 4, is probably the correct approach. I had a 
side-meeting with some of the ARM folks (Govind and Honnappa), and we thought 
using a generation number for the state rather than just waiting "long-enough" 
to recycle it could work. The generation number would be the atomic value 
associated with the state. So consider this API:

 - MP-safe pools store generation numbers alongside each object.
 - When you allocate a new object from the pool you get an index and generation 
number.
 - When storing the object index you also save the generation number.
 - When getting a pointer to the object you pass the API the index and 
generation number and it will return NULL if the generation number did not 
match the one stored with the object in the pool.
 - When you delete a pool object its generation number is incremented (with 
barrier).

The size of the generation number needs to be large enough to guarantee there 
is no wrap with objects still in the system that have stored the generation 
number. Technically this is a "long-enough" aspect of the scheme. :) One could 
imagine using less than 64 bits for the combination of index and generation, if 
that was important.

The advantage over just waiting N seconds to recycle the index is that the 
system scales better, i.e., if you just wait N seconds to reuse, and are 
creating and deleting objects at a significant rate, your pool can blow up in 
the N seconds of time. With the generation number this is not a problem as you 
can re-use the object immediately. Another advantage is that you don't have to 
have the timer logic (looping per pool or processing all pools) to free up old 
indices.

The generation number scheme will still need the thread barrier to increment 
the generation number to make sure no-one is using the object in parallel. But 
this is a common problem with deleting non-reference-counted shared state I 
believe.

When you mentioned packet counters, that's really a reference count I guess. 
The trade-off here seems to me to be 2 cache-line-invalidates per packet (once 
on ingress once on egress) for the counter vs a barrier hit (all packet 
processing stops) per delete of the state. For your setup that you measured the 
packet counter solution how long does it spend from the barrier sync request to 
release (i.e., how long is the system not processing packets)?

Thanks,
Chris.

> On Apr 15, 2020, at 5:38 AM, Neale Ranns (nranns) <nra...@cisco.com> wrote:
> 
> 
> Hi Chris,
> 
> Firstly, apologies for the lengthy delay. 
> 
> When I say 'state' in the following I'm referring to some object[s] that are 
> used to forward packets. 
> 
> I'd classify the possible solution space as:
> 1) maintain per-packet counters for the state to indicate how many packets 
> currently refer to that state.
>     Pros; we know exactly when the state is no longer required and can be 
> safely removed.
>     Cons; significant per-packet cost, similar to maintaining counters. For 
> reference, on my [aging] system enabling adjacency counters takes ip4-rewrite 
> from 2.52e1 to 3.49e1 clocks. The wait times could be large (equivalent to 
> flushing queues). 
> 2) flush queues; ensure that there are no packets in flight, anywhere, when 
> the workers stop at the barrier.
>    Pros; It's certainly safe to delete state under these conditions.
>    Cons; for handoff this could be known, though the wait time would be long. 
> For async crypto HW this may not be knowable and if it is the wait times 
> would be large. Either way we may end up waiting for a worst-case scenario, 
> which is way longer that actually needed.
> 3) epochs; maintain a global epoch; each time an API is called, the epoch is 
> bumped. Packets entering the system get stamped with the current epoch. If a 
> node sees a packet whose epoch does not match the global one, it is dropped.
>    Pros: simple scheme, low/negligible DP cost.
>    Cons: all inflight packets would be dropped on all API calls, not just the 
> packets that would use the state that is being deleted.
> 4) MP safe: remove the state with the workers unblocked. This is a 
> multi-stage process. Firstly, unlink the state from the lookup 
> data-structures so no more packets can find it. Secondly, 'atomically' update 
> the state so that packets using it still perform a consistent action 
> (probably drop). Thirdly, don't reuse that state (i.e. recycle its pool 
> index) until all the inflight packets pass through the system (mis-forwarding 
> must be avoided). Make-before-break, if that term means anything to you __
>    Pros; MP safe is always good, since there's less packet drops. Zero 
> per-packet DP cost. 
>    Cons; it's not easy to get right nor test.
> 
> IMHO the drawbacks of options 1, 2 & 3 rule them out, which leaves us only 4.
> 
> For option 4, the first and second steps are very much dependent on the type 
> of state we're talking about. For SAs for example, unlinking the SA from the 
> lookup data-structure is accomplished using a separate API from the SA 
> delete*. The final step we can easily accomplish with a new version of the 
> pool allocator whose free-list prevents reuse for say 5 seconds (an age in DP 
> terms).
> 
> Thoughts?
> 
> /neale
> 
> * I note that a SA delete is already (optimistically) marked MP safe, which 
> assumes the system flushes inbetween these API calls.
> 
> 
> 
> 
> On 26/03/2020 16:09, "Christian Hopps" <cho...@chopps.org> wrote:
> 
> 
> 
>> On Mar 25, 2020, at 1:39 PM, Dave Barach via Lists.Fd.Io 
>> <dbarach=cisco....@lists.fd.io> wrote:
>> 
>> Vlib_main_t *vm->main_loop_count.
>> 
>> One trip around the main loop accounts for all per-worker local graph edges 
>> / acyclic graph behaviors. 
>> 
>> As to the magic number E (not to be confused with e): repeatedly handing off 
>> packets from thread to thread seems like a bad implementation strategy. The 
>> packet tracer will tell you how many handoffs are involved in a certain 
>> path, as will a bit of code inspection.
> 
>    No, it would not be a good implementation strategy. :)
> 
>    However, I was looking at trying to code this in an upstreamable way, and 
> I didn't think I got to make assumptions about how others might wire things 
> together. I suppose we could just define a maximum number of handoffs and 
> then if users violated that number they would need to increase it?
> 
>> Neale has some experience with this scenario, maybe he can share some 
>> thoughts...
> 
>    Hoping so. :)
> 
>    I noticed that crypto engine handoffs were added to the non-dpdk ipsec 
> encrypt/decrypt in master, which seems somewhat relevant.
> 
>    Thanks,
>    Chris.
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16085): https://lists.fd.io/g/vpp-dev/message/16085
Mute This Topic: https://lists.fd.io/mt/72542383/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to