[Xen-devel] [PATCH v2 for-4.6 3/3] tools/libxl: Only continue stream operations if the stream is still in progress
Part of the callback contract with check_all_finished() is that each running parallel task shall call it exactly once. Previously, it was possible for stream_continue() or write_toolstack_record() to fail and call into check_all_finished(). As the save helpers callback has fired, it no longer counts as in use, which causes check_all_finished() to fire the stream callback. Then, unwinding the stack back and calling check_all_finished() a second time results in the same conditions being observed, and the stream callback being fired a second time. To avoid this, check_all_finished() is called before any other actions which continue the stream functionality, and the stream is only continued if it has not been torn down. This guarantees not to continue stream operations if the stream does not owe a callback to check_all_finished(). Signed-off-by: Andrew Cooper --- CC: Ian Campbell CC: Ian Jackson CC: Wei Liu --- tools/libxl/libxl_stream_read.c | 14 -- tools/libxl/libxl_stream_write.c |5 +++-- 2 files changed, 11 insertions(+), 8 deletions(-) diff --git a/tools/libxl/libxl_stream_read.c b/tools/libxl/libxl_stream_read.c index 65100f4..3432933 100644 --- a/tools/libxl/libxl_stream_read.c +++ b/tools/libxl/libxl_stream_read.c @@ -738,14 +738,16 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void, goto err; } -/* - * Libxc has indicated that it is done with the stream. Resume reading - * libxl records from it. - */ -stream_continue(egc, stream); - err: check_all_finished(egc, stream, rc); + +if (libxl__stream_read_inuse(stream)) { +/* + * Libxc has indicated that it is done with the stream. Resume reading + * libxl records from it. + */ +stream_continue(egc, stream); +} } static void conversion_done(libxl__egc *egc, diff --git a/tools/libxl/libxl_stream_write.c b/tools/libxl/libxl_stream_write.c index 676ad0a..44f8b2f 100644 --- a/tools/libxl/libxl_stream_write.c +++ b/tools/libxl/libxl_stream_write.c @@ -275,10 +275,11 @@ void libxl__xc_domain_save_done(libxl__egc *egc, void *dss_void, goto err; } -write_toolstack_record(egc, stream); - err: check_all_finished(egc, stream, rc); + +if (libxl__stream_write_inuse(stream)) +write_toolstack_record(egc, stream); } static void write_toolstack_record(libxl__egc *egc, -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 for-4.6 3/3] tools/libxl: Only continue stream operations if the stream is still in progress
Andrew Cooper writes ("[PATCH v2 for-4.6 3/3] tools/libxl: Only continue stream operations if the stream is still in progress"): > Part of the callback contract with check_all_finished() is that each > running parallel task shall call it exactly once. ... > @@ -738,14 +738,16 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, > void *dcs_void, > goto err; > } > > -/* > - * Libxc has indicated that it is done with the stream. Resume reading > - * libxl records from it. > - */ > -stream_continue(egc, stream); > - > err: > check_all_finished(egc, stream, rc); > + > +if (libxl__stream_read_inuse(stream)) { > +/* > + * Libxc has indicated that it is done with the stream. Resume re\ ading > + * libxl records from it. > + */ > +stream_continue(egc, stream); > +} This doesn't look convincing to me. (Also, wrap damage in the comment.) It may be possible to prove that it's not buggy, but it's certainly confusing. In general the the which sets up the next callback should be at the end of functions, because that way reentrancy hazards are more easily seen to be avoided. Why not just do this: /* * Libxc has indicated that it is done with the stream. Resume reading * libxl records from it. */ stream_continue(egc, stream); +return; err: check_all_finished(egc, stream, rc); ? And the same in the next one. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 for-4.6 3/3] tools/libxl: Only continue stream operations if the stream is still in progress
On 28/07/15 14:41, Ian Jackson wrote: > Andrew Cooper writes ("[PATCH v2 for-4.6 3/3] tools/libxl: Only continue > stream operations if the stream is still in progress"): >> Part of the callback contract with check_all_finished() is that each >> running parallel task shall call it exactly once. > ... >> @@ -738,14 +738,16 @@ void libxl__xc_domain_restore_done(libxl__egc *egc, >> void *dcs_void, >> goto err; >> } >> >> -/* >> - * Libxc has indicated that it is done with the stream. Resume reading >> - * libxl records from it. >> - */ >> -stream_continue(egc, stream); >> - >> err: >> check_all_finished(egc, stream, rc); >> + >> +if (libxl__stream_read_inuse(stream)) { >> +/* >> + * Libxc has indicated that it is done with the stream. Resume re\ > ading >> + * libxl records from it. >> + */ >> +stream_continue(egc, stream); >> +} > This doesn't look convincing to me. (Also, wrap damage in the > comment.) It may be possible to prove that it's not buggy, but it's > certainly confusing. In general the the which sets up the next > callback should be at the end of functions, because that way > reentrancy hazards are more easily seen to be avoided. > > Why not just do this: > >/* > * Libxc has indicated that it is done with the stream. Resume reading > * libxl records from it. > */ >stream_continue(egc, stream); > +return; > > err: >check_all_finished(egc, stream, rc); > > ? > > And the same in the next one. That was my first attempt to fix this issue (as observed in v1), but it is buggy. libxl__xc_domain_restore_done() is required to call check_all_finished() exactly once, as part of its callback contract. Imagine a scenario whereby some error has occured and check_all_finished() has _abort()'ed the tasks, but the save helper was already on the way out, signalling success. In such a case, we would both omit the check_all_finished() call, thus omitting the eventual stream callback, and start a further action on a now-defunct stream. It is only save to stream_continue() if the stream is currently in use, which is not a guaranteed situation in this function even if rc is 0. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 for-4.6 3/3] tools/libxl: Only continue stream operations if the stream is still in progress
Andrew Cooper writes ("Re: [PATCH v2 for-4.6 3/3] tools/libxl: Only continue stream operations if the stream is still in progress"): > Imagine a scenario whereby some error has occured and > check_all_finished() has _abort()'ed the tasks, but the save helper was > already on the way out, signalling success. ... > It is only save to stream_continue() if the stream is currently in use, > which is not a guaranteed situation in this function even if rc is 0. Hrm. Yes. What do you think about putting the inuse check in stream_continue ? Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 for-4.6 3/3] tools/libxl: Only continue stream operations if the stream is still in progress
On 28/07/15 16:12, Ian Jackson wrote: > Andrew Cooper writes ("Re: [PATCH v2 for-4.6 3/3] tools/libxl: Only continue > stream operations if the stream is still in progress"): >> Imagine a scenario whereby some error has occured and >> check_all_finished() has _abort()'ed the tasks, but the save helper was >> already on the way out, signalling success. > ... >> It is only save to stream_continue() if the stream is currently in use, >> which is not a guaranteed situation in this function even if rc is 0. erm s/save/safe/ > Hrm. Yes. > > What do you think about putting the inuse check in stream_continue ? That would work on the stream_read side but not the stream_write side, but is not really correct IMO. The _inuse() check is needed because the save helper callback is not sure whether the stream is in use or not. This is a property of the save helper callback, rather than the stream. Pushing the _inuse() check into the next layer would function, but it adds extra _inuse() checks to other codepaths which should be fatal if they failed in other contexts. Would resubmitting with extra comments explaining this suffice? ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 for-4.6 3/3] tools/libxl: Only continue stream operations if the stream is still in progress
Andrew Cooper writes ("Re: [PATCH v2 for-4.6 3/3] tools/libxl: Only continue stream operations if the stream is still in progress"): > On 28/07/15 16:12, Ian Jackson wrote: > > What do you think about putting the inuse check in stream_continue ? > > That would work on the stream_read side but not the stream_write side, > but is not really correct IMO. I have reread the code on the write side and I think I agree with you there. The next function to be called is write_toolstack_record so things are a lot less abstract. On the read side I looked at a lot of functions which contained: stream_continue(egc, stream); return; err: assert(rc); stream_complete(egc, stream, rc); and ISTM that if the analysis leading to these call sites for stream_continue is wrong, a similar hazard would exist. > The _inuse() check is needed because the save helper callback is not > sure whether the stream is in use or not. This is a property of the > save helper callback, rather than the stream. In this event-driven code, I prefer to have a situation where the next thing to do is obvious from the recorded state, and is calculated explicitly (see also the way that check_all_finished is now). This is much less confusing and error prone than code where the next thing to do depends on _both_ the recorded state, _and_ the call path (representing the most recent thing that happened). > Pushing the _inuse() check into the next layer would function, but it > adds extra _inuse() checks to other codepaths which should be fatal if > they failed in other contexts. I don't understand how an _inuse check in stream_continue could ever do the wrong thing. If the code is reorganised later, it seems more likely to me that there will introduced new situations where we need to do "see if we are still running, and if so do stream_continue, otherwise go to check_all_finished". Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 for-4.6 3/3] tools/libxl: Only continue stream operations if the stream is still in progress
On 28/07/15 16:59, Ian Jackson wrote: > Andrew Cooper writes ("Re: [PATCH v2 for-4.6 3/3] tools/libxl: Only continue > stream operations if the stream is still in progress"): >> On 28/07/15 16:12, Ian Jackson wrote: >>> What do you think about putting the inuse check in stream_continue ? >> That would work on the stream_read side but not the stream_write side, >> but is not really correct IMO. > I have reread the code on the write side and I think I agree with you > there. The next function to be called is write_toolstack_record so > things are a lot less abstract. > > On the read side I looked at a lot of functions which contained: > > stream_continue(egc, stream); > return; > >err: > assert(rc); > stream_complete(egc, stream, rc); > > and ISTM that if the analysis leading to these call sites for > stream_continue is wrong, a similar hazard would exist. The vast majority of these cases are from internal callbacks. i.e. they are within the logical context of the current running stream. The cases which come in from some external context are libxl__stream_read_start_checkpoint() and libxl__xc_domain_restore_done(). The latter is the problem case, while the former asserts that the stream is in use. > >> The _inuse() check is needed because the save helper callback is not >> sure whether the stream is in use or not. This is a property of the >> save helper callback, rather than the stream. > In this event-driven code, I prefer to have a situation where the next > thing to do is obvious from the recorded state, and is calculated > explicitly (see also the way that check_all_finished is now). > > This is much less confusing and error prone than code where the next > thing to do depends on _both_ the recorded state, _and_ the call path > (representing the most recent thing that happened). This point is specifically a join point of the end of one asynchronous task, to another which is expected to be active and running. It is logically different to the other cases where stream_continue() is used. > >> Pushing the _inuse() check into the next layer would function, but it >> adds extra _inuse() checks to other codepaths which should be fatal if >> they failed in other contexts. > I don't understand how an _inuse check in stream_continue could ever > do the wrong thing. If the code is reorganised later, it seems more > likely to me that there will introduced new situations where we need > to do "see if we are still running, and if so do stream_continue, > otherwise go to check_all_finished". check_all_finished() must only ever be called once for each started task. Having if ( inuse ) continue() else check_all_finished() will make it far more likely to accidentally fire the overall stream callback twice. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 for-4.6 3/3] tools/libxl: Only continue stream operations if the stream is still in progress
Andrew Cooper writes ("Re: [PATCH v2 for-4.6 3/3] tools/libxl: Only continue stream operations if the stream is still in progress"): > check_all_finished() must only ever be called once for each started > task. Having > > if ( inuse ) > continue() > else > check_all_finished() > > will make it far more likely to accidentally fire the overall stream > callback twice. I think extra calls to check_all_finished are harmless. But anyway, I can see that this conversation isn't really converging. I think there is no functional difference between the two approaches we are debating. So this comes down to a matter of taste. I don't want to put my foot down as maintainer, especially since you wrote most of this code. So, I think the bikeshed might as well be your favoured shade of orange: Acked-by: Ian Jackson Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 for-4.6 3/3] tools/libxl: Only continue stream operations if the stream is still in progress
On Wed, 2015-07-29 at 15:17 +0100, Ian Campbell wrote: > On Tue, 2015-07-28 at 18:07 +0100, Ian Jackson wrote: > > Andrew Cooper writes ("Re: [PATCH v2 for-4.6 3/3] tools/libxl: Only > > continue stream operations if the stream is still in progress"): > > > check_all_finished() must only ever be called once for each started > > > task. Having > > > > > > if ( inuse ) > > > continue() > > > else > > > check_all_finished() > > > > > > will make it far more likely to accidentally fire the overall stream > > > callback twice. > > > > I think extra calls to check_all_finished are harmless. > > > > But anyway, I can see that this conversation isn't really converging. > > I think there is no functional difference between the two approaches > > we are debating. So this comes down to a matter of taste. > > > > I don't want to put my foot down as maintainer, especially since you > > wrote most of this code. So, I think the bikeshed might as well be > > your favoured shade of orange: > > > > Acked-by: Ian Jackson > > This is an ack to the patch in < > 1438015647-25377-4-git-send-email-andrew.coop...@citrix.com> i.e. the > original 3/3 of this posting, right? > > (and not, for example, an ack to some other proposal made during this > thread) Nevermind, I see now that this came back in "Fix libxl TOOLSTACK records for migration v2" with the ack in place. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 for-4.6 3/3] tools/libxl: Only continue stream operations if the stream is still in progress
On Tue, 2015-07-28 at 18:07 +0100, Ian Jackson wrote: > Andrew Cooper writes ("Re: [PATCH v2 for-4.6 3/3] tools/libxl: Only > continue stream operations if the stream is still in progress"): > > check_all_finished() must only ever be called once for each started > > task. Having > > > > if ( inuse ) > > continue() > > else > > check_all_finished() > > > > will make it far more likely to accidentally fire the overall stream > > callback twice. > > I think extra calls to check_all_finished are harmless. > > But anyway, I can see that this conversation isn't really converging. > I think there is no functional difference between the two approaches > we are debating. So this comes down to a matter of taste. > > I don't want to put my foot down as maintainer, especially since you > wrote most of this code. So, I think the bikeshed might as well be > your favoured shade of orange: > > Acked-by: Ian Jackson This is an ack to the patch in < 1438015647-25377-4-git-send-email-andrew.coop...@citrix.com> i.e. the original 3/3 of this posting, right? (and not, for example, an ack to some other proposal made during this thread) Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel