mem-sharing: copy GADDR based shared guest areas

Tamas K Lengyel Thu, 04 May 2023 05:51:50 -0700

On Thu, May 4, 2023 at 3:44 AM Jan Beulich <jbeul...@suse.com> wrote:
>
> On 03.05.2023 19:14, Tamas K Lengyel wrote:
> >> @@ -1974,7 +2046,10 @@ int mem_sharing_fork_reset(struct domain
> >>
> >>   state:
> >>      if ( reset_state )
> >> +    {
> >>          rc = copy_settings(d, pd);
> >> +        /* TBD: What to do here with -ERESTART? */
> >
> > Ideally we could avoid hitting code-paths that are restartable during
fork
> > reset since it gets called from vm_event replies that have no concept of
> > handling errors. If we start having errors like this we would just have
to
> > drop the vm_event reply optimization and issue a standalone fork reset
> > hypercall every time which isn't a big deal, it's just slower.
>
> I'm afraid I don't follow: We are in the process of fork-reset here. How
> would issuing "a standalone fork reset hypercall every time" make this
> any different? The possible need for a continuation here comes from a
> failed spin_trylock() in map_guest_area(). That won't change the next
> time round.


Why not? Who is holding the lock and why wouldn't it ever relinquish it? If
that's really true then there is a larger issue then just not being able to
report the error back to the user on vm_event_resume path and we need to
devise a way of being able to copy this from the parent bypassing this
lock. The parent is paused and its state should not be changing while forks
are active, so if the lock was turned into an rwlock of some sort so we can
acquire the read-lock would perhaps be a possible way out of this.

>
> But perhaps I should say that till now I didn't even pay much attention
> to the 2nd use of the function by vm_event_resume(); I was mainly
> focused on the one from XENMEM_sharing_op_fork_reset, where no
> continuation handling exists. Yet perhaps your comment is mainly
> related to that use?
>
> I actually notice that the comment ahead of the function already has a
> continuation related TODO, just that there thought is only of larger
> memory footprint.

With XENMEM_sharing_op_fork_reset the caller actually receives the error
code and can decide what to do next. With vm_event_resume there is no path
currently to notify the agent of an error. We could generate another
vm_event to send such an error, but the expectation with fork_reset is that
it will always work because the parent is paused, so not having that path
for an error to get back to the agent isn't a big deal.

Now, if it becomes the case that due to this locking we can get an error
even while the parent is paused, that will render the vm_event_resume path
unreliably so we would just switch to using XENMEM_sharing_op_fork_reset so
that at least it can re-try in case of an issue. Of course, only if a
reissue of the hypercall has any reasonable chance of succeeding.

>
> > My
> > preference would actually be that after the initial forking is
performed a
> > local copy of the parent's state is maintained for the fork to reset to
so
> > there would be no case of hitting an error like this. It would also
allow
> > us in the future to unpause the parent for example..
>
> Oh, I guess I didn't realize the parent was kept paused. Such state
> copying / caching may then indeed be a possibility, but that's nothing
> I can see myself deal with, even less so in the context of this series.
> I need a solution to the problem at hand within the scope of what is
> there right now (or based on what could be provided e.g. by you within
> the foreseeable future). Bubbling up the need for continuation from the
> XENMEM_sharing_op_fork_reset path is the most I could see me handle
> myself ... For vm_event_resume() bubbling state up the domctl path
> _may_ also be doable, but mem_sharing_notification() and friends don't
> even check the function's return value.

Sure, I wasn't expecting that work to be done as part of this series, just
as something I would like to get to in the future.

Tamas

Re: [PATCH v3 4/8] x86/mem-sharing: copy GADDR based shared guest areas

Reply via email to