Re: pconn.cc assert index >= 0, async call queue madness

Alex Rousskov Fri, 11 Apr 2008 13:21:51 -0700

On Fri, 2008-04-11 at 15:21 +0200, Henrik Nordstrom wrote: 
> fre 2008-04-11 klockan 01:00 -0600 skrev Alex Rousskov:
> 
> > Receiving or handling a call is not an instantaneous event though (and
> > never was). Consider this pseudo-code:
> 
> Yes. And at Call::dial calling foo.handleThissCall() I consider the call
> made. The fact that foo.handleThisCall() hasn't returned yet is mostly
> irrelevant. And in the current code base it's completely irrelevant as
> it's the only function executing at the moment, only becomes a concern
> the day we look into SMP.


I agree. I just felt it was important to explain this nuance to avoid
the confusion about "in-progress" or "dialing" calls.

> Some suggestions on how this design can be improved to ease things a
> bit, minimizing the risk for bad use.
> 
> a) Comm needs to be redesigned to act as both caller and receiver for
> the legacy interfaces where it interfaces with function callback code
> and constructs internal Calls. Today it only acts as caller, making
> receiver cancellation of such calls unreliable.

I understand what you are saying and am not against it as long as the
line between Comm Core and Comm Receiver Wrappers for Legacy Code is
clear. 

However, it may be more cost-efficient to get rid of the legacy
interfaces than to work hard to support them better. Christos has done a
lot of that. While more work remains, it is not a huge pile. For
example, we only have 5 calls to comm_read_cancel and 26 calls to
comm_read, some of which are already obeying the new rules.

> b) Call should be extended with a mechanism where the caller can ask to
> get a notification if the receier cancels the call before it's being
> made, eleminating the need for the receiver to both cancel the call and
> tell the caller that he is no longer interested.

I believe a _general_ mechanism for that would be a step in the wrong
direction because:

a) Currently, the call is only "connected" to the recipient. The call
can be stored and passed around by others as needed. Connecting the call
to the caller changes that and complicates the design, You would have to
keep track of the caller and make sure it is always known and is either
ready to handle noteThatYourCallbackWasCanceled() or even do something
special to pass that to the next/final caller.

b) "Get a notification" in a general mechanism would imply async calling
(otherwise we are going to re-create recursive behavior because the
noteThatYourCallbackWasCanceled handler may cancel other calls!). It
would be messy to write calls that make other calls.

c) Not all calls are callbacks. For example, "sides", ICAP, and
BodyPipes do not use callbacks to communicate with each other. They use
calls. Whether that is worse, better, or as-good as Comm callback API is
a separate debate. Adding a heavy generic mechanism that many do not
need is not a good idea. Unfortunately, C++ does not make it easy to add
this mechanism to selected Call classes, without complicating the design
(i.e., more templates and such).

d) In the future, we will not be able to guarantee that callback
cancellation reaches the caller immediately: Even if the callback and
the caller are tied, the recipient and the caller may be in different
threads. The whole point that tying Call::cancel with
Module::cancel_operation_before_you_make_it becomes moot in such cases.


However, I am OK with custom mechanisms to accomplish the same goal.
For example, when Comm stuffs poll(2) arrays, it can detect operations
without live (i.e., not canceled) callbacks and stop polling those FDs
(or take another appropriate action).

Still, I think keeping/making Comm and AsyncCalls APIs simple and
focused on one thing while reviewing the code to make sure the APIs are
used correctly is better than making significantly more complex APIs and
wrappers to partially enforce a noncritical rule.


> > In other words, only call recipients should cancel calls. What makes it
> > confusing is that the current code uses things like comm_read_cancel
> > which can be interpreted as the recipient canceling its callback but
> > might be called by non-recipients as well. The suggestions below try to
> > clear this mess.
> 
> I do not view comm_read_cancel as having this dual personality. Should
> only be called by the same code which called comm_read. Any other use of
> comm_read_cancel is wrong.

Right, but I said "things like comm_read_cancel". I suspect, for
example, comm_close() is used by everybody and their friends to abort
transactions, with unrelated code waiting for the corresponding
callbacks.

> > If the reader does not want to or cannot read any more, it cancels the
> > call. It needs to keep a pointer to the call if it might cancel the
> > call. If the same reader thinks that comm might still be reading from
> > that file descriptor, the reader also calls comm_read_cancel. This last
> > sentence is unrelated to calls; it just makes sure comm is not doing
> > something in vein. Comm_read_cancel cancels the read, not the callback!
> > Call:cancel cancels the call, not the read.
> 
> I would like to get rid of the need to use comm_read_cancel here.
> Cancelling the call should be sufficient, and the caller should be able
> to derive from there when needed.

If you mean that canceling a read callback should trigger a chain
reaction to cancel the comm read, then I disagree for the reasons
discussed above. I think we should not tie the two things together, but
can provide a simple wrapper like comm_cancel_my_read() that would take
care of both the call and the read cancellations.

> What I worry about is how to prove reliable cancellations, and
> minimizing the risk of developers doing things wrong.

Call cancellations are already reliable.

A comm_cancel_my_read wrapper will help mitigate risk (but not eliminate
it). Note that once the comm_cancel_my_read is in place, the only
mistake the programmer can make is to call Call::cancel instead of
calling comm_cancel_my_read. The effect? Comm may poll the FD one extra
time. We are not risking much here, right?

> In this specific case receivers using the legacy callback function based
> versions of comm_read_cancel or commSetTimeout to cancel their requests
> expecting the request to be cancelled, but where the current code does
> not guarantee that because comm doesn't fulfill the requirements above.

I suggest that we rename comm_read_cancel and others t and add a
callback reference parameter to make sure that all code supplies the
stored callback to be canceled (with a firm guarantee!). If some code
does not have a stored callback, it would have to call
comm_read_cancel_if_not_too_late() instead, which may prompt the
developer or reviewer to think "What if it is too late?".

> I also worry about having a design which requires more than one
> conscious action to be taken by the developer of a recipient to both
> cancel the call and cancel the operation which should make the call
> (mainly for performance reasons, but also helps getting rid of now
> unwanted chain effects)..

Cancel_my_foo wrappers solve that problem. Only one action is required.

> Or we could get rid of comm_read_cancel entirely.. at least when dealing
> with AsyncCall based comm_read calls.

All comm_read calls are AsyncCall based, I think. Some calls are created
dynamically/internally (perhaps that is what you mean). The
corresponding readers will have to be either updated or use
comm_read_cancel_if_not_too_late. Otherwise, they will not compile.

> And we had a similar race in comm_close earlier if you remember.

Yes, I do. And we would had another "race" in comm_cancel_write if comm
implemented that (it should, but that is a different story).

> > Does this answer your questions?
> 
> Yes, but not yet in 100% agreement.

The only disagreement I detected is about tying AsyncCall and Comm via
Call::cancel(). Did I manage to convince you that simple wrappers like
comm_cancel_my_read are better than tying the two tightly with the
Call::cancel code?

Thank you,

Alex.

Re: pconn.cc assert index >= 0, async call queue madness

Reply via email to