Hi Paul,
On 13/02/2020 10:53, Paul Durrant wrote:
This patch details proposes extra migration data and xenstore protocol
extensions to support non-cooperative live migration of guests.
Signed-off-by: Paul Durrant <pdurr...@amazon.com>
---
Cc: Andrew Cooper <andrew.coop...@citrix.com>
Cc: George Dunlap <george.dun...@eu.citrix.com>
Cc: Ian Jackson <ian.jack...@eu.citrix.com>
Cc: Jan Beulich <jbeul...@suse.com>
Cc: Julien Grall <jul...@xen.org>
Cc: Konrad Rzeszutek Wilk <konrad.w...@oracle.com>
Cc: Stefano Stabellini <sstabell...@kernel.org>
Cc: Wei Liu <w...@xen.org>
v5:
- Add QUIESCE
- Make semantics of <index> in GET_DOMAIN_WATCHES more clear
v4:
- Drop the restrictions on special paths
v3:
- New in v3
---
docs/designs/xenstore-migration.md | 136 +++++++++++++++++++++++++++++
1 file changed, 136 insertions(+)
create mode 100644 docs/designs/xenstore-migration.md
diff --git a/docs/designs/xenstore-migration.md
b/docs/designs/xenstore-migration.md
new file mode 100644
index 0000000000..5cfe2d9a7d
--- /dev/null
+++ b/docs/designs/xenstore-migration.md
@@ -0,0 +1,136 @@
+# Xenstore Migration
+
+## Background
+
+The design for *Non-Cooperative Migration of Guests*[1] explains that extra
+save records are required in the migrations stream to allow a guest running
+PV drivers to be migrated without its co-operation. Moreover the save
+records must include details of registered xenstore watches as well as
+content; information that cannot currently be recovered from `xenstored`,
+and hence some extension to the xenstore protocol[2] will also be required.
+
+The *libxenlight Domain Image Format* specification[3] already defines a
+record type `EMULATOR_XENSTORE_DATA` but this is not suitable for
+transferring xenstore data pertaining to the domain directly as it is
+specified such that keys are relative to the path
+`/local/domain/$dm_domid/device-model/$domid`. Thus it is necessary to
+define at least one new save record type.
+
+## Proposal
+
+### New Save Record
+
+A new mandatory record type should be defined within the libxenlight Domain
+Image Format:
+
+`0x00000007: DOMAIN_XENSTORE_DATA`
+
+The format of each of these new records should be as follows:
+
+
+```
+0 1 2 3 4 5 6 7 octet
++------------------------+------------------------+
+| type | record specific data |
++------------------------+ |
+...
++-------------------------------------------------+
+```
+
+
+| Field | Description |
+|---|---|
Did you indend to add more - so | is on the same column as the onter lines?
+| `type` | 0x00000000: invalid |
+| | 0x00000001: node data |
+| | 0x00000002: watch data |
Should not the last | be some of the columns on all the lines?
+| | 0x00000003 - 0xFFFFFFFF: reserved for future use |
Looking at the spec, the command TRANSACTION_END *must* be used with an
existing transaction. As a guest would be migrate to a new domain, the
transaction ID would now be invalid.
I understand that xenstored is able to cope with it, but such behavior
is not described in the spec. So I am not sure we can expect a guest to
cope with an error value other than the ones described for the command.
+
+
+where data is always in the form of a NUL separated and terminated tuple
+as follows
+
+
+**node data**
+
+
+`<path>|<value>|<perm-as-string>|`
I don't think this would work. From the spec, <value> is a binary data
and therefore it can contain zero or nul. So you would not be able to
find out where the <perm-as-string> starts.
Regarding the <perm-as-string>, it is only describing the permission for
one domain. If multiple domains can access the node, then you would have
multiple <perm-as-string>. Do we want to transfer all the permissions,
if not how do we define which permissions should be transferred?
+
+
+`<path>` is considered relative to the domain path `/local/domain/$domid`
+and hence must not begin with `/`.
+`<path>` and `<value>` should be suitable to formulate a `WRITE` operation
+to the receiving xenstore and `<perm-as-string>` should be similarly suitable
+to formulate a subsequent `SET_PERMS` operation.
+
+**watch data**
+
+
+`<path>|<token>|`
+
+`<path>` again is considered relative and, together with `<token>`, should
+be suitable to formulate an `ADD_DOMAIN_WATCHES` operation (see below).
AFAICT, a guest is allowed to watch /. So is it a sensible thing to only
transfer relative watch?
Also, how about special watch (i.e @...)?
+
+
+### Protocol Extension
+
+Before xenstore state is migrated it is necessary to wait for any pending
+reads, writes, watch registrations etc. to complete, and also to make sure
+that xenstored does not start processing any new requests (so that new
+requests remain pending on the shared ring for subsequent processing on the
+new host). Hence the following operation is needed:
+
+```
+QUIESCE <domid>|
+
+Complete processing of any request issued by the specified domain, and
+do not process any further requests from the shared ring.
+```
+
+The `WATCH` operation does not allow specification of a `<domid>`; it is
+assumed that the watch pertains to the domain that owns the shared ring
+over which the operation is passed. Hence, for the tool-stack to be able
+to register a watch on behalf of a domain a new operation is needed:
+
+```
+ADD_DOMAIN_WATCHES <domid>|<watch>|+
+
+Adds watches on behalf of the specified domain.
+
+<watch> is a NUL separated tuple of <path>|<token>. The semantics of this
+operation are identical to the domain issuing WATCH <path>|<token>| for
+each <watch>.
+```
+
+The watch information for a domain also needs to be extracted from the
+sending xenstored so the following operation is also needed:
+
+```
+GET_DOMAIN_WATCHES <domid>|<index> <gencnt>|<watch>|*
+
+Gets the list of watches that are currently registered for the domain.
+
+<watch> is a NUL separated tuple of <path>|<token>. The sub-list returned
+will start at <index> items into the the overall list of watches and may
+be truncated (at a <watch> boundary) such that the returned data fits
+within XENSTORE_PAYLOAD_MAX.
+
+If <index> is beyond the end of the overall list then the returned sub-
+list will be empty. If the value of <gencnt> changes then it indicates
+that the overall watch list has changed and thus it may be necessary
+to re-issue the operation for previous values of <index>.
+```
+
+It may also be desirable to state in the protocol specification that
+the `INTRODUCE` operation should not clear the `<mfn>` specified such that
Not directly related to this patch, the '<mfn>' is slightly confusing
because, AFAICT, this will actually hold an GFN. To avoid spreading more
misuse, it would make sense to update the xenstore accordingly and use
the new term here.
+a `RELEASE` operation followed by an `INTRODUCE` operation form an
+idempotent pair. The current implementation of *C xentored* does this
+(in the `domain_conn_reset()` function) but this could be dropped as this
+behaviour is not currently specified and the page will always be zeroed
+for a newly created domain.
+
+
+* * *
+
+[1] See
https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/designs/non-cooperative-migration.md
+[2] See
https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/xenstore.txt
+[3] See
https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/specs/libxl-migration-stream.pandoc
Cheers,
--
Julien Grall
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel