> why we ended up needing such a complicated procedure to apply what looked 
> like simple updates.

All I can tell is that avf_process was preferring to suspend itself
(rather than busy wait) already in the first AVF commit [13],
and later the wait times got even longer [14].

> I’m pretty sure we could mark things as down and program an asyc cleanup from 
> within the avf layer.

Probably yes, but the avf layer should keep track of the removal,
so it knows to be careful on subsequent add.
I guess the current way is safer, especially if we want to panic early on any 
error.

> it would be good if we could avoid suspending the current process in avf
> because it can’t know all the ways in which the calling process could be 
> signaled.

Yes, requiring each process to exhibit only one agent behavior would fix the 
issue,
avf_send_to_pf could call vlib_time_now in a loop (instead of suspending 
avf_process).

But I think VPP could easily offer some support for multiple agent behaviors,
for example by allowing processes to list which event types should be able to 
wake them
(so vl_api_clnt_process in avf_process_request will not wake up upon 
SOCKET_READ_EVENT).

Vratko.

[13] https://gerrit.fd.io/r/c/vpp/+/10457/41/src/plugins/avf/device.c#377
[14] https://gerrit.fd.io/r/c/vpp/+/21831/3/src/plugins/avf/device.c#415

From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of Florin Coras
Sent: Tuesday, 2022-September-13 19:54
To: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] request-response between vlib processes

Hi Vratko,


On Sep 13, 2022, at 5:03 AM, Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES 
at Cisco) via lists.fd.io<http://lists.fd.io> 
<vrpolak=cisco....@lists.fd.io<mailto:vrpolak=cisco....@lists.fd.io>> wrote:

In general, most of “communication” between VPP components
is done by directly calling C functions,
so it makes sense avf_flag_change is being called within vl_api_clnt_process 
process.
It is avf_process_request (called directly by avf_flag_change)
that decides to hand-off the request to avf_process process for async handling,
so it should make sure to resume the API process correctly upon the response.

> just to set a mac address?

In my particular test the async operation switches promiscuous mode on an 
interface,
but I guess it does not really matter what a particular operation does.
What matters is there is a synchronous API call (l2_patch_add_del in my test)
which only indirectly causes an asynchronous operation (as the interface uses 
AVF driver).


Didn’t have an issue with how the api ends up calling avf_process_request. I 
was just wondering why we ended up needing such a complicated procedure to 
apply what looked like simple updates.


> Do we really need to block the binary api

The l2_patch_add_del does block.
Especially in the “del” case, the subsequent API calls
need to know whether the interface is gone yet or not.

I’m pretty sure we could mark things as down and program an asyc cleanup from 
within the avf layer. That is, if async is necessary, for deletes we should be 
able to provide a return code as soon as we find that the device/state exists 
and program the removal.

But for adds, it would be good if we could avoid suspending the current process 
in avf because it can’t know all the ways in which the calling process could be 
signaled.


> pass opaques in requests

As usual, there are several ways to make it work,
we just need to pick one (and put an example usage into the docs).

And I believe that’s what we’re discussing here :-)

Florin



Vratko.

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
<vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> On Behalf Of Florin Coras
Sent: Monday, 2022-September-12 23:11
To: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] request-response between vlib processes

Hi Vratko,

Do we really need to block the binary api waiting for a reply from another vpp 
process just to set a mac address?

If setting up the mac (or similar) cannot be done synchronously, probably api 
handlers should hand over all those requests to another vpp process, 
vl_api_async_req_process, that takes care of async execution and generation of 
api replies. You could also pass opaques in requests and maybe expect backends, 
like avf_process, to bounce that opaques back for demuxing.

Regards,
Florin



On Sep 12, 2022, at 4:49 AM, Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES 
at Cisco) vialists.fd.io<http://lists.fd.io/> 
<vrpolak=cisco....@lists.fd.io<mailto:vrpolak=cisco....@lists.fd.io>> wrote:

[resending to the correct vpp-dev e-mail address]

Short version:
Vratko would appreciate something like 
vlib_current_process_wait_for_one_time_event_or_clock.

Medium version:
One instance of request-response interaction between vlib processes had a bug 
[11].
Vratko contributed a fix [9] for the immediate issue,
but the proper fix was left hinted in TODOs (and discussed in the long version).

Long version:

Vlib supports processes and signals, see corresponding sections in the docs [7].
Using the actor model vocabulary, a (vlib) process is an actor,
and (vlib) signaling a (vlib) event means sending a message between actors.
There is no vlib name for actor behavior [10].

The typical use of event signaling in VPP is “fire and forget”,
meaning a “request” without any need to respond.
That means a typical process has just one behavior;
the side effects of a process are given by event type (and data),
not directly by the sequence of previous events received.

But there is an exception (and in future there may be more).
The process avf_process, when handling AVF_PROCESS_EVENT_REQ
and detecting that was signaled by some other process,
it signals back a “response” event.
The main reason is that some operations may take unreasonably long time,
and we prefer VPP to crash there (instead of getting stuck)
so we can see the backtrace.

A typical process that signaled AVF_PROCESS_EVENT_REQ is vl_api_clnt_process,
whose loop usually handles SOCKET_READ_EVENT events.
I mean, this socket API handling process has no idea about AVF plugin specific 
needs,
but it can call avf_process_request function which (upon detecting it is not 
called
from avf_process process) does the signaling and waiting.

But this means vl_api_clnt_process is the first process (that I know of) with 
two behaviors.
The first one focuses on handling new API messages,
the second one focuses on handling the AVF response (especially the lack 
thereof in time).
As clib_panic is called when the response does not arrive,
(and I hope there are never two requests at the same time)
the first behavior never encounters the AVF response.
But the second behavior can encounter SOCKET_READ_EVENT.
The VPP-2033 [11] bug is what happens in that case.

A minor issue is that the “response” event is defined just by
event type being zero, which would not work in (hypothetical future) scenarios
when a single process waits for two different responses.

Reading through node_funcs.h I found 
vlib_current_process_wait_for_one_time_event [12],
which looks suited for waiting for “single response” events,
but it lacks the time awareness of vlib_process_wait_for_event_or_clock.
If we had something like vlib_current_process_wait_for_one_time_event_or_clock
(and its example usage in the docs), handling the response would become easier.

Vratko.

[7] 
https://github.com/FDio/vpp/blob/9ad39c026c8a3c945a7003c4aa4f5cb1d4c80160/docs/developer/corearchitecture/vlib.rst
[9] https://gerrit.fd.io/r/c/vpp/+/37083
[10] https://en.wikipedia.org/wiki/Actor_model#Behaviors
[11] https://jira.fd.io/browse/VPP-2033
[12] 
https://github.com/FDio/vpp/blob/16052480c377127f9cb7facbab53f46e595b27cf/src/vlib/node_funcs.h#L1186









-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21897): https://lists.fd.io/g/vpp-dev/message/21897
Mute This Topic: https://lists.fd.io/mt/93630182/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

  • ... Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via lists.fd.io
    • ... Florin Coras
      • ... Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via lists.fd.io
        • ... Florin Coras
          • ... Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via lists.fd.io

Reply via email to