Wei Liu writes ("[RFC] Classify and remove (some) abort()s in libxl"):
> There has been some interest in removing abort() in libxl in another
> thread. I think this topic deserves a dedicated thread.
>
> I've checked most instances of abort() and exit() in code and classify
> them as several classes.
I'm not sure I quite agree with the classification of all your
example, but your categories seem mostly sound.
Did you search just for abort() or also for assert() ? There are a
lot more of the latter, and they are IMO equivalent.
Some people think abort() is unfriendly because it doesn't print
anything to stderr. I would be sympathetic to patches to change each
abort call into something like assert(!"description of problem"). (NB
there may be some exceptional aborts in signal handlers or some such
which ought not to try to enter stdio.)
> * System has entered an impossible to recover state
>
> Can't be removed, there is no meaningful return code to return.
> E.g. libxl_utils.c, libxl_event.c, libxl_exec.c and libxl_fork.c
>
> * Used by some stub functions
>
> Can be classified as "impossible to recover state" because caller
> shouldn't have use them in the first place. But can be relaxed to
> return error code.
>
> * Configuration error
>
> Some internal functions expect sanitised input. Up until now the
> expectation (at least AIUI) is that libxl should have sanitised those
> values before calling internal functions. I'm not sure if this rule is
> strictly followed though.
>
> The abort() in this class can be and turn into error return path.
>
> E.g. various devices and domain configuration options
>
> The "configuration error" class is the easiest one to trip over for
> library user. I think we can change that class to return error code
> provided there is enough interest.
There should certainly be no way to get libxl to abort simply by
passing it invalid combinations of parameters, or wrong parameter
values. If there are any such cases
(NB by "wrong parameter values" I do not mean to include NULL pointers
where a non-NULL pointer is always required.)
I would support changing any abort (including assertion failure),
which results from anomalous domain states or configurations, into an
appropriate error return, unless it is obvious from the nearby code
that the abort is really excluded.
(An example of one that ought to be excluded is the abort() in switch
statements on domain type. I'm not convinced it's useful to try to
handle libxl__domain_type giving invalid answers. But I would welcome
suggestions of an approach which would give a static warning if we
accidentally omit one of the cases.)
> * Memory allocation failure
>
> Actually exit() is called, but process will exit anyway. Can't be
> easily changed without rewriting error handle logic across libxl.
We decided a few years ago to take the approach that memory allocation
failure in libxl would be fatal for the process. Before we took that
decision, most allocation sites were already cavalier about this
issue.
We could reverse that decision but it would considerably complicate
some areas of the code (with tiresome error handling). I would want
to see a commitment by someone proposing such a change to overhaul the
whole of libxl to handle allocation failures _and_ to review all new
code for what is effectively then a newly-proposed class of bug.
> The "stub functions" class can also be dealt with, but I'm not too keen
> on changing that.
These fall into two cases:
* aborts for which it is reasonably straightforward to prove that
they can't occur. (These are often in stub functions which can
never be called, such as osevent_hook_failed_release.)
* aborts which result from serious breaches by the application of
rules which are needed for the integrity of libxl's data structures
(and the integrity of the whole process).
Many of the second category could in principle be turned into error
returns but I don't think that would be wise. During the initial
development of the new libxl approach to event handling, and the
development of the corresponding code in libvirt, we discovered a
number of bugs which triggered these kind of assertions.
I think those bugs would have been a lot harder to find if they had
simply produced error returns. In many cases the error occurred on
some kind of teardown path and might even have been ignored.
Thanks,
Ian.
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel