Hi Vratko,

> On Sep 13, 2022, at 5:03 AM, Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES 
> at Cisco) via lists.fd.io <vrpolak=cisco....@lists.fd.io> wrote:
> 
> In general, most of “communication” between VPP components
> is done by directly calling C functions,
> so it makes sense avf_flag_change is being called within vl_api_clnt_process 
> process.
> It is avf_process_request (called directly by avf_flag_change)
> that decides to hand-off the request to avf_process process for async 
> handling,
> so it should make sure to resume the API process correctly upon the response.
> 
> > just to set a mac address? 
>  
> In my particular test the async operation switches promiscuous mode on an 
> interface,
> but I guess it does not really matter what a particular operation does.
> What matters is there is a synchronous API call (l2_patch_add_del in my test)
> which only indirectly causes an asynchronous operation (as the interface uses 
> AVF driver).
>  

Didn’t have an issue with how the api ends up calling avf_process_request. I 
was just wondering why we ended up needing such a complicated procedure to 
apply what looked like simple updates.

> > Do we really need to block the binary api 
>  
> The l2_patch_add_del does block.
> Especially in the “del” case, the subsequent API calls
> need to know whether the interface is gone yet or not.

I’m pretty sure we could mark things as down and program an asyc cleanup from 
within the avf layer. That is, if async is necessary, for deletes we should be 
able to provide a return code as soon as we find that the device/state exists 
and program the removal.

But for adds, it would be good if we could avoid suspending the current process 
in avf because it can’t know all the ways in which the calling process could be 
signaled. 

>  
> > pass opaques in requests
>  
> As usual, there are several ways to make it work,
> we just need to pick one (and put an example usage into the docs).

And I believe that’s what we’re discussing here :-) 

Florin

>  
> Vratko.
>  
> From: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io> <vpp-dev@lists.fd.io 
> <mailto:vpp-dev@lists.fd.io>> On Behalf Of Florin Coras
> Sent: Monday, 2022-September-12 23:11
> To: vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>
> Subject: Re: [vpp-dev] request-response between vlib processes
>  
> Hi Vratko, 
>  
> Do we really need to block the binary api waiting for a reply from another 
> vpp process just to set a mac address? 
>  
> If setting up the mac (or similar) cannot be done synchronously, probably api 
> handlers should hand over all those requests to another vpp process, 
> vl_api_async_req_process, that takes care of async execution and generation 
> of api replies. You could also pass opaques in requests and maybe expect 
> backends, like avf_process, to bounce that opaques back for demuxing. 
>  
> Regards,
> Florin
> 
> 
> On Sep 12, 2022, at 4:49 AM, Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES 
> at Cisco) vialists.fd.io <http://lists.fd.io/> <vrpolak=cisco....@lists.fd.io 
> <mailto:vrpolak=cisco....@lists.fd.io>> wrote:
>  
> [resending to the correct vpp-dev e-mail address]
>  
> Short version:
> Vratko would appreciate something like 
> vlib_current_process_wait_for_one_time_event_or_clock.
>  
> Medium version:
> One instance of request-response interaction between vlib processes had a bug 
> [11].
> Vratko contributed a fix [9] for the immediate issue,
> but the proper fix was left hinted in TODOs (and discussed in the long 
> version).
>  
> Long version:
>  
> Vlib supports processes and signals, see corresponding sections in the docs 
> [7].
> Using the actor model vocabulary, a (vlib) process is an actor,
> and (vlib) signaling a (vlib) event means sending a message between actors.
> There is no vlib name for actor behavior [10].
>  
> The typical use of event signaling in VPP is “fire and forget”,
> meaning a “request” without any need to respond.
> That means a typical process has just one behavior;
> the side effects of a process are given by event type (and data),
> not directly by the sequence of previous events received.
>  
> But there is an exception (and in future there may be more).
> The process avf_process, when handling AVF_PROCESS_EVENT_REQ
> and detecting that was signaled by some other process,
> it signals back a “response” event.
> The main reason is that some operations may take unreasonably long time,
> and we prefer VPP to crash there (instead of getting stuck)
> so we can see the backtrace.
>  
> A typical process that signaled AVF_PROCESS_EVENT_REQ is vl_api_clnt_process,
> whose loop usually handles SOCKET_READ_EVENT events.
> I mean, this socket API handling process has no idea about AVF plugin 
> specific needs,
> but it can call avf_process_request function which (upon detecting it is not 
> called
> from avf_process process) does the signaling and waiting.
>  
> But this means vl_api_clnt_process is the first process (that I know of) with 
> two behaviors.
> The first one focuses on handling new API messages,
> the second one focuses on handling the AVF response (especially the lack 
> thereof in time).
> As clib_panic is called when the response does not arrive,
> (and I hope there are never two requests at the same time)
> the first behavior never encounters the AVF response.
> But the second behavior can encounter SOCKET_READ_EVENT.
> The VPP-2033 [11] bug is what happens in that case.
>  
> A minor issue is that the “response” event is defined just by
> event type being zero, which would not work in (hypothetical future) scenarios
> when a single process waits for two different responses.
>  
> Reading through node_funcs.h I found 
> vlib_current_process_wait_for_one_time_event [12],
> which looks suited for waiting for “single response” events,
> but it lacks the time awareness of vlib_process_wait_for_event_or_clock.
> If we had something like vlib_current_process_wait_for_one_time_event_or_clock
> (and its example usage in the docs), handling the response would become 
> easier.
>  
> Vratko.
>  
> [7] 
> https://github.com/FDio/vpp/blob/9ad39c026c8a3c945a7003c4aa4f5cb1d4c80160/docs/developer/corearchitecture/vlib.rst
>  
> <https://github.com/FDio/vpp/blob/9ad39c026c8a3c945a7003c4aa4f5cb1d4c80160/docs/developer/corearchitecture/vlib.rst>
> [9] https://gerrit.fd.io/r/c/vpp/+/37083 
> <https://gerrit.fd.io/r/c/vpp/+/37083>
> [10] https://en.wikipedia.org/wiki/Actor_model#Behaviors 
> <https://en.wikipedia.org/wiki/Actor_model#Behaviors>
> [11] https://jira.fd.io/browse/VPP-2033 <https://jira.fd.io/browse/VPP-2033>
> [12] 
> https://github.com/FDio/vpp/blob/16052480c377127f9cb7facbab53f46e595b27cf/src/vlib/node_funcs.h#L1186
>  
> <https://github.com/FDio/vpp/blob/16052480c377127f9cb7facbab53f46e595b27cf/src/vlib/node_funcs.h#L1186>
> 
> 
> 
>  
> 
> 
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21875): https://lists.fd.io/g/vpp-dev/message/21875
Mute This Topic: https://lists.fd.io/mt/93630182/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

  • ... Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via lists.fd.io
    • ... Florin Coras
      • ... Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via lists.fd.io
        • ... Florin Coras
          • ... Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via lists.fd.io

Reply via email to