On 29/01/2020 14:47, Paul Durrant wrote:
> diff --git a/docs/designs/non-cooperative-migration.md 
> b/docs/designs/non-cooperative-migration.md
> new file mode 100644
> index 0000000000..5db3939db5
> --- /dev/null
> +++ b/docs/designs/non-cooperative-migration.md
> @@ -0,0 +1,272 @@
> +# Non-Cooperative Migration of Guests on Xen
> +
> +## Background
> +
> +The normal model of migration in Xen is driven by the guest because it was
> +originally implemented for PV guests, where the guest must be aware it is
> +running under Xen and is hence expected to co-operate. 

For PV guests, is more than "expected to co-operate".

Migrating a PV guest involves rewriting every pagetable entry with a
different MFN, so even before you consider things like the PV protocols,
there is no way this could be done without the cooperation of the guest.

Sadly, this fact was depended upon for migration of the PV protocols,
and has migrated (excuse the pun) into the HVM world as well.

> This model dates from
> +an era when it was assumed that the host administrator had control of at 
> least
> +the privileged software running in the guest (i.e. the guest kernel) which 
> may
> +still be true in an enterprise deployment but is not generally true in a 
> cloud
> +environment.

I haven't seen it discussed elsewhere, but even enterprise environments
have problems.

Having host admin == guest admin doesn't mean that guest drivers aren't
buggy, or that the VM doesn't explode on migrate.

The simple fact is that involving the guest kernel adds unnecessary
moving parts which can (and do with a non-zero probability) go wrong.

>  The aim of this design is to provide a model which is purely host
> +driven, requiring no co-operation from the software running in the
> +guest, and is thus suitable for cloud scenarios.
> +
> +PV guests are out of scope for this project because, as is outlined above, 
> they
> +have a symbiotic relationship with the hypervisor and therefore a certain 
> level
> +of co-operation can be assumed.

If nothing else, I'd at least suggest s/can be assumed/is necessary/.

> +Because the service domain’s domid is used directly by the guest in setting
> +up grant entries and event channels, the backend drivers in the new host
> +environment must be provided by service domain with the same domid. Also,
> +because the guest can sample its own domid from the frontend area and use it 
> in
> +hypercalls (e.g. HVMOP_set_param) rather than DOMID_SELF, the guest domid 
> must
> +also be preserved to maintain the ABI.

Has this been true since forever?  The grant and event APIs took some
care to avoid the guest needing to know its own domid.

> +
> +Furthermore, it will necessary to modify backend drivers to re-establish
> +communication with frontend drivers without perturbing the content of the
> +backend area or requiring any changes to the values of the xenstore state 
> nodes.
> +
> +## Other Para-Virtual State
> +
> +### Shared Rings
> +
> +Because the console and store protocol shared pages are actually part of the
> +guest memory image (in an E820 reserved region just below 4G) 

Typically*.

Their exact location is entirely up to the domain builder, and tend not
to be there for PVH guests which aren't trying to fit the two frames
into a BAR.

> then the content
> +will get migrated as part of the guest memory image. Hence no additional code
> +is require to prevent any guest visible change in the content.

I do agree with this conclusion however.

> +### Shared Info
> +
> +There is already a record defined in *libxenctrl Domain Image Format* [3]
> +called `SHARED_INFO` which simply contains a complete copy of the domain’s
> +shared info page. It is not currently incuded in an HVM (type `0x0002`)
> +migration stream. It may be feasible to include it as an optional record
> +but it is not clear that the content of the shared info page ever needs
> +to be preserved for an HVM guest.
> +
> +For a PV guest the `arch_shared_info` sub-structure contains important
> +information about the guest’s P2M, but this information is not relevant for
> +an HVM guest where the P2M is not directly manipulated via the guest. The 
> other
> +state contained in the `shared_info` structure relates the domain wall-clock
> +(the state of which should already be transferred by the `RTC` HVM context
> +information which contained in the `HVM_CONTEXT` save record) and some event
> +channel state (particularly if using the *2l* protocol). Event channel state
> +will need to be fully transferred if we are not going to require the guest
> +co-operation to re-open the channels and so it should be possible to 
> re-build a
> +shared info page for an HVM guest from such other state.
> +
> +Note that the shared info page also contains an array of 
> `XEN_LEGACY_MAX_VCPUS`
> +(32) `vcpu_info` structures. A domain may nominate a different guest physical
> +address to use for the vcpu info. This is mandatory for if a domain wants to
> +use more than 32 vCPUs and optional for legacy vCPUs. This mapping is not
> +currently transferred in the migration state so this will either need to be
> +added into an existing save record, or an additional type of save record will
> +be needed.

For non-cooperative migration in the current ABI, a minimum is to know
where the shared info frame is mapped, so it can be re-mapped on behalf
of the guest on the destination side.

The rest of this section will be very good evidence in the "new guest
ABI" design.

> +### Grant table
> +
> +The grant table is essentially the para-virtual equivalent of an IOMMU.

TBH, I think "shared memory" is a much better analogy than an IOMMU. 
OTOH, perhaps that doesn't cope with the grant copy aspect quite as well
as I'd like.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to