Re: Live migration and PV device handling

2020-04-07 Thread Tamas K Lengyel
On Tue, Apr 7, 2020 at 1:57 AM Paul Durrant  wrote:
>
> > -Original Message-
> > From: Xen-devel  On Behalf Of Tamas 
> > K Lengyel
> > Sent: 06 April 2020 18:31
> > To: Andrew Cooper 
> > Cc: Xen-devel ; Anastassios Nanos 
> > 
> > Subject: Re: Live migration and PV device handling
> >
> > On Mon, Apr 6, 2020 at 11:24 AM Andrew Cooper  
> > wrote:
> > >
> > > On 06/04/2020 18:16, Tamas K Lengyel wrote:
> > > > On Fri, Apr 3, 2020 at 6:44 AM Andrew Cooper 
> > > >  wrote:
> > > >> On 03/04/2020 13:32, Anastassios Nanos wrote:
> > > >>> Hi all,
> > > >>>
> > > >>> I am trying to understand how live-migration happens in xen. I am
> > > >>> looking in the HVM guest case and I have dug into the relevant parts
> > > >>> of the toolstack and the hypervisor regarding memory, vCPU context
> > > >>> etc.
> > > >>>
> > > >>> In particular, I am interested in how PV device migration happens. I
> > > >>> assume that the guest is not aware of any suspend/resume operations
> > > >>> being done
> > > >> Sadly, this assumption is not correct.  HVM guests with PV drivers
> > > >> currently have to be aware in exactly the same way as PV guests.
> > > >>
> > > >> Work is in progress to try and address this.  See
> > > >> https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=775a02452ddf3a6889690de90b1a94eb29c3c732
> > > >> (sorry - for some reason that doc isn't being rendered properly in
> > > >> https://xenbits.xen.org/docs/ )
> > > > That proposal is very interesting - first time it came across my radar
> > > > - but I dislike the idea that domain IDs need to be preserved for
> > > > uncooperative migration to work.
> > >
> > > The above restriction is necessary to work with existing guests, which
> > > is an implementation requirement of the folks driving the work.
> > >
> > > > Ideally I would be able to take
> > > > advantage of the same plumbing to perform forking of VMs with PV
> > > > drivers where preserving the domain id is impossible since its still
> > > > in use.
> > >
> > > We would of course like to make changes to remove the above restriction
> > > in the longterm.  The problem is that it is not a trivial thing to fix.
> > > Various things were discussed in Chicago, but I don't recall if any of
> > > the plans made their way onto xen-devel.
> >
> > Yea I imagine trying to get this to work with existing PV drivers is
> > not possible in any other way.
>
> No, as the doc says, the domid forms part of the protocol, hence being 
> visible to the guest, and the guest may sample and use the value when making 
> certain hypercalls (only some enforce use of DOMID_SELF). Thus faking it 
> without risking a guest crash is going to be difficult.
>
> > But if we can update the PV driver code
> > such that in the longterm it can work without preserving the domain
> > ID, that would be worthwhile.
> >
>
> I think that ship has sailed. It would probably be simpler and cheaper to 
> just get virtio working with Xen.

That would certainly make sense to me. That would reduce the
maintenance overhead considerably if we all converged on a single
standard.

Tamas



RE: Live migration and PV device handling

2020-04-07 Thread Paul Durrant
> -Original Message-
> From: Xen-devel  On Behalf Of Tamas K 
> Lengyel
> Sent: 06 April 2020 18:31
> To: Andrew Cooper 
> Cc: Xen-devel ; Anastassios Nanos 
> 
> Subject: Re: Live migration and PV device handling
> 
> On Mon, Apr 6, 2020 at 11:24 AM Andrew Cooper  
> wrote:
> >
> > On 06/04/2020 18:16, Tamas K Lengyel wrote:
> > > On Fri, Apr 3, 2020 at 6:44 AM Andrew Cooper  
> > > wrote:
> > >> On 03/04/2020 13:32, Anastassios Nanos wrote:
> > >>> Hi all,
> > >>>
> > >>> I am trying to understand how live-migration happens in xen. I am
> > >>> looking in the HVM guest case and I have dug into the relevant parts
> > >>> of the toolstack and the hypervisor regarding memory, vCPU context
> > >>> etc.
> > >>>
> > >>> In particular, I am interested in how PV device migration happens. I
> > >>> assume that the guest is not aware of any suspend/resume operations
> > >>> being done
> > >> Sadly, this assumption is not correct.  HVM guests with PV drivers
> > >> currently have to be aware in exactly the same way as PV guests.
> > >>
> > >> Work is in progress to try and address this.  See
> > >> https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=775a02452ddf3a6889690de90b1a94eb29c3c732
> > >> (sorry - for some reason that doc isn't being rendered properly in
> > >> https://xenbits.xen.org/docs/ )
> > > That proposal is very interesting - first time it came across my radar
> > > - but I dislike the idea that domain IDs need to be preserved for
> > > uncooperative migration to work.
> >
> > The above restriction is necessary to work with existing guests, which
> > is an implementation requirement of the folks driving the work.
> >
> > > Ideally I would be able to take
> > > advantage of the same plumbing to perform forking of VMs with PV
> > > drivers where preserving the domain id is impossible since its still
> > > in use.
> >
> > We would of course like to make changes to remove the above restriction
> > in the longterm.  The problem is that it is not a trivial thing to fix.
> > Various things were discussed in Chicago, but I don't recall if any of
> > the plans made their way onto xen-devel.
> 
> Yea I imagine trying to get this to work with existing PV drivers is
> not possible in any other way.

No, as the doc says, the domid forms part of the protocol, hence being visible 
to the guest, and the guest may sample and use the value when making certain 
hypercalls (only some enforce use of DOMID_SELF). Thus faking it without 
risking a guest crash is going to be difficult.

> But if we can update the PV driver code
> such that in the longterm it can work without preserving the domain
> ID, that would be worthwhile.
> 

I think that ship has sailed. It would probably be simpler and cheaper to just 
get virtio working with Xen.

  Paul





Re: Live migration and PV device handling

2020-04-06 Thread Tamas K Lengyel
On Mon, Apr 6, 2020 at 11:24 AM Andrew Cooper  wrote:
>
> On 06/04/2020 18:16, Tamas K Lengyel wrote:
> > On Fri, Apr 3, 2020 at 6:44 AM Andrew Cooper  
> > wrote:
> >> On 03/04/2020 13:32, Anastassios Nanos wrote:
> >>> Hi all,
> >>>
> >>> I am trying to understand how live-migration happens in xen. I am
> >>> looking in the HVM guest case and I have dug into the relevant parts
> >>> of the toolstack and the hypervisor regarding memory, vCPU context
> >>> etc.
> >>>
> >>> In particular, I am interested in how PV device migration happens. I
> >>> assume that the guest is not aware of any suspend/resume operations
> >>> being done
> >> Sadly, this assumption is not correct.  HVM guests with PV drivers
> >> currently have to be aware in exactly the same way as PV guests.
> >>
> >> Work is in progress to try and address this.  See
> >> https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=775a02452ddf3a6889690de90b1a94eb29c3c732
> >> (sorry - for some reason that doc isn't being rendered properly in
> >> https://xenbits.xen.org/docs/ )
> > That proposal is very interesting - first time it came across my radar
> > - but I dislike the idea that domain IDs need to be preserved for
> > uncooperative migration to work.
>
> The above restriction is necessary to work with existing guests, which
> is an implementation requirement of the folks driving the work.
>
> > Ideally I would be able to take
> > advantage of the same plumbing to perform forking of VMs with PV
> > drivers where preserving the domain id is impossible since its still
> > in use.
>
> We would of course like to make changes to remove the above restriction
> in the longterm.  The problem is that it is not a trivial thing to fix.
> Various things were discussed in Chicago, but I don't recall if any of
> the plans made their way onto xen-devel.

Yea I imagine trying to get this to work with existing PV drivers is
not possible in any other way. But if we can update the PV driver code
such that in the longterm it can work without preserving the domain
ID, that would be worthwhile.

Tamas



Re: Live migration and PV device handling

2020-04-06 Thread Andrew Cooper
On 06/04/2020 18:16, Tamas K Lengyel wrote:
> On Fri, Apr 3, 2020 at 6:44 AM Andrew Cooper  
> wrote:
>> On 03/04/2020 13:32, Anastassios Nanos wrote:
>>> Hi all,
>>>
>>> I am trying to understand how live-migration happens in xen. I am
>>> looking in the HVM guest case and I have dug into the relevant parts
>>> of the toolstack and the hypervisor regarding memory, vCPU context
>>> etc.
>>>
>>> In particular, I am interested in how PV device migration happens. I
>>> assume that the guest is not aware of any suspend/resume operations
>>> being done
>> Sadly, this assumption is not correct.  HVM guests with PV drivers
>> currently have to be aware in exactly the same way as PV guests.
>>
>> Work is in progress to try and address this.  See
>> https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=775a02452ddf3a6889690de90b1a94eb29c3c732
>> (sorry - for some reason that doc isn't being rendered properly in
>> https://xenbits.xen.org/docs/ )
> That proposal is very interesting - first time it came across my radar
> - but I dislike the idea that domain IDs need to be preserved for
> uncooperative migration to work.

The above restriction is necessary to work with existing guests, which
is an implementation requirement of the folks driving the work.

> Ideally I would be able to take
> advantage of the same plumbing to perform forking of VMs with PV
> drivers where preserving the domain id is impossible since its still
> in use.

We would of course like to make changes to remove the above restriction
in the longterm.  The problem is that it is not a trivial thing to fix. 
Various things were discussed in Chicago, but I don't recall if any of
the plans made their way onto xen-devel.

~Andrew



Re: Live migration and PV device handling

2020-04-06 Thread Tamas K Lengyel
On Fri, Apr 3, 2020 at 6:44 AM Andrew Cooper  wrote:
>
> On 03/04/2020 13:32, Anastassios Nanos wrote:
> > Hi all,
> >
> > I am trying to understand how live-migration happens in xen. I am
> > looking in the HVM guest case and I have dug into the relevant parts
> > of the toolstack and the hypervisor regarding memory, vCPU context
> > etc.
> >
> > In particular, I am interested in how PV device migration happens. I
> > assume that the guest is not aware of any suspend/resume operations
> > being done
>
> Sadly, this assumption is not correct.  HVM guests with PV drivers
> currently have to be aware in exactly the same way as PV guests.
>
> Work is in progress to try and address this.  See
> https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=775a02452ddf3a6889690de90b1a94eb29c3c732
> (sorry - for some reason that doc isn't being rendered properly in
> https://xenbits.xen.org/docs/ )

That proposal is very interesting - first time it came across my radar
- but I dislike the idea that domain IDs need to be preserved for
uncooperative migration to work. Ideally I would be able to take
advantage of the same plumbing to perform forking of VMs with PV
drivers where preserving the domain id is impossible since its still
in use.

Tamas



Re: Live migration and PV device handling

2020-04-06 Thread Andrew Cooper
On 06/04/2020 08:50, Paul Durrant wrote:
>> -Original Message-
>> From: Xen-devel  On Behalf Of Dongli 
>> Zhang
>> Sent: 03 April 2020 23:33
>> To: Andrew Cooper ; Anastassios Nanos 
>> ; xen-
>> de...@lists.xen.org
>> Subject: Re: Live migration and PV device handling
>>
>> Hi Andrew,
>>
>> On 4/3/20 5:42 AM, Andrew Cooper wrote:
>>> On 03/04/2020 13:32, Anastassios Nanos wrote:
>>>> Hi all,
>>>>
>>>> I am trying to understand how live-migration happens in xen. I am
>>>> looking in the HVM guest case and I have dug into the relevant parts
>>>> of the toolstack and the hypervisor regarding memory, vCPU context
>>>> etc.
>>>>
>>>> In particular, I am interested in how PV device migration happens. I
>>>> assume that the guest is not aware of any suspend/resume operations
>>>> being done
>>> Sadly, this assumption is not correct.  HVM guests with PV drivers
>>> currently have to be aware in exactly the same way as PV guests.
>>>
>>> Work is in progress to try and address this.  See
>>> https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=775a02452ddf3a6889690de90b1a94eb29c3c732
>>> (sorry - for some reason that doc isn't being rendered properly in
>>> https://xenbits.xen.org/docs/ )

Document rendering now fixed.

https://xenbits.xen.org/docs/unstable/designs/non-cooperative-migration.html

>> I read below from the commit:
>>
>> +* The toolstack choose a randomized domid for initial creation or default
>> +migration, but preserve the source domid non-cooperative migration.
>> +Non-Cooperative migration will have to be denied if the domid is
>> +unavailable on the target host, but randomization of domid on creation
>> +should hopefully minimize the likelihood of this. Non-Cooperative migration
>> +to localhost will clearly not be possible.
>>
>> Does that indicate while scope of domid_t is shared by a single server in old
>> design, the scope of domid_t is shared by a cluster of server in new design?
>>
>> That is, the domid should be unique in the cluster of all servers if we 
>> expect
>> non-cooperative migration always succeed?
>>
> That would be necessary to guarantee success (or rather guarantee no failure 
> due to domid clash) but the scope of xl/libxl is single serve, hence 
> randomization is the best we have to reduce clashes to a minimum.

domid's are inherently a local concept and will remain so, but a
toolstack managing multiple servers and wanting to use this version of
non-cooperative migration will have to manage domid's cluster wide.

~Andrew



RE: Live migration and PV device handling

2020-04-06 Thread Paul Durrant
> -Original Message-
> From: Xen-devel  On Behalf Of Dongli 
> Zhang
> Sent: 03 April 2020 23:33
> To: Andrew Cooper ; Anastassios Nanos 
> ; xen-
> de...@lists.xen.org
> Subject: Re: Live migration and PV device handling
> 
> Hi Andrew,
> 
> On 4/3/20 5:42 AM, Andrew Cooper wrote:
> > On 03/04/2020 13:32, Anastassios Nanos wrote:
> >> Hi all,
> >>
> >> I am trying to understand how live-migration happens in xen. I am
> >> looking in the HVM guest case and I have dug into the relevant parts
> >> of the toolstack and the hypervisor regarding memory, vCPU context
> >> etc.
> >>
> >> In particular, I am interested in how PV device migration happens. I
> >> assume that the guest is not aware of any suspend/resume operations
> >> being done
> >
> > Sadly, this assumption is not correct.  HVM guests with PV drivers
> > currently have to be aware in exactly the same way as PV guests.
> >
> > Work is in progress to try and address this.  See
> > https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=775a02452ddf3a6889690de90b1a94eb29c3c732
> > (sorry - for some reason that doc isn't being rendered properly in
> > https://xenbits.xen.org/docs/ )
> >
> 
> I read below from the commit:
> 
> +* The toolstack choose a randomized domid for initial creation or default
> +migration, but preserve the source domid non-cooperative migration.
> +Non-Cooperative migration will have to be denied if the domid is
> +unavailable on the target host, but randomization of domid on creation
> +should hopefully minimize the likelihood of this. Non-Cooperative migration
> +to localhost will clearly not be possible.
> 
> Does that indicate while scope of domid_t is shared by a single server in old
> design, the scope of domid_t is shared by a cluster of server in new design?
> 
> That is, the domid should be unique in the cluster of all servers if we expect
> non-cooperative migration always succeed?
> 

That would be necessary to guarantee success (or rather guarantee no failure 
due to domid clash) but the scope of xl/libxl is single serve, hence 
randomization is the best we have to reduce clashes to a minimum.

  Paul




Re: Live migration and PV device handling

2020-04-03 Thread Dongli Zhang
Hi Andrew,

On 4/3/20 5:42 AM, Andrew Cooper wrote:
> On 03/04/2020 13:32, Anastassios Nanos wrote:
>> Hi all,
>>
>> I am trying to understand how live-migration happens in xen. I am
>> looking in the HVM guest case and I have dug into the relevant parts
>> of the toolstack and the hypervisor regarding memory, vCPU context
>> etc.
>>
>> In particular, I am interested in how PV device migration happens. I
>> assume that the guest is not aware of any suspend/resume operations
>> being done
> 
> Sadly, this assumption is not correct.  HVM guests with PV drivers
> currently have to be aware in exactly the same way as PV guests.
> 
> Work is in progress to try and address this.  See
> https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=775a02452ddf3a6889690de90b1a94eb29c3c732
> (sorry - for some reason that doc isn't being rendered properly in
> https://xenbits.xen.org/docs/ )
> 

I read below from the commit:

+* The toolstack choose a randomized domid for initial creation or default
+migration, but preserve the source domid non-cooperative migration.
+Non-Cooperative migration will have to be denied if the domid is
+unavailable on the target host, but randomization of domid on creation
+should hopefully minimize the likelihood of this. Non-Cooperative migration
+to localhost will clearly not be possible.

Does that indicate while scope of domid_t is shared by a single server in old
design, the scope of domid_t is shared by a cluster of server in new design?

That is, the domid should be unique in the cluster of all servers if we expect
non-cooperative migration always succeed?

Thank you very much!

Dongli Zhang



Re: Live migration and PV device handling

2020-04-03 Thread Andrew Cooper
On 03/04/2020 13:32, Anastassios Nanos wrote:
> Hi all,
>
> I am trying to understand how live-migration happens in xen. I am
> looking in the HVM guest case and I have dug into the relevant parts
> of the toolstack and the hypervisor regarding memory, vCPU context
> etc.
>
> In particular, I am interested in how PV device migration happens. I
> assume that the guest is not aware of any suspend/resume operations
> being done

Sadly, this assumption is not correct.  HVM guests with PV drivers
currently have to be aware in exactly the same way as PV guests.

Work is in progress to try and address this.  See
https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=775a02452ddf3a6889690de90b1a94eb29c3c732
(sorry - for some reason that doc isn't being rendered properly in
https://xenbits.xen.org/docs/ )

~Andrew