RE: [PATCH] docs/designs: re-work the xenstore migration document...
> -Original Message- > From: Jürgen Groß > Sent: 27 April 2020 08:42 > To: Paul Durrant ; xen-devel@lists.xenproject.org > Cc: Paul Durrant > Subject: Re: [PATCH] docs/designs: re-work the xenstore migration document... > > On 24.04.20 15:37, Paul Durrant wrote: > > From: Paul Durrant > > > > ... to specify a separate migration stream that will also be suitable for > > live update. > > > > The original scope of the document was to support non-cooperative migration > > of guests [1] but, since then, live update of xenstored has been brought > > into > > scope. Thus it makes more sense to define a separate image format for > > serializing xenstore state that is suitable for both purposes. > > > > The document has been limited to specifying a new image format. The > > mechanism > > for acquiring the image for live update or migration is not covered as that > > is more appropriately dealt with by a patch to docs/misc/xenstore.txt. It is > > also expected that, when the first implementation of live update or > > migration > > making use of this specification is committed, that the document is moved > > from > > docs/designs into docs/specs. > > Can we please add a general remark: > > Records depending other records (e.g. watch records referencing a > connection record via a connection-id, or node records for a node > being a child of another node in the stream) must always be located > after the record they depend on. > Sure. I'll call that out at the beginning of the records section. Paul > > Juergen
Re: [PATCH] docs/designs: re-work the xenstore migration document...
On 24.04.20 15:37, Paul Durrant wrote: From: Paul Durrant ... to specify a separate migration stream that will also be suitable for live update. The original scope of the document was to support non-cooperative migration of guests [1] but, since then, live update of xenstored has been brought into scope. Thus it makes more sense to define a separate image format for serializing xenstore state that is suitable for both purposes. The document has been limited to specifying a new image format. The mechanism for acquiring the image for live update or migration is not covered as that is more appropriately dealt with by a patch to docs/misc/xenstore.txt. It is also expected that, when the first implementation of live update or migration making use of this specification is committed, that the document is moved from docs/designs into docs/specs. Can we please add a general remark: Records depending other records (e.g. watch records referencing a connection record via a connection-id, or node records for a node being a child of another node in the stream) must always be located after the record they depend on. Juergen
Re: [PATCH] docs/designs: re-work the xenstore migration document...
On 24.04.20 18:20, Paul Durrant wrote: -Original Message- From: Julien Grall Sent: 24 April 2020 17:04 To: Jürgen Groß ; Paul Durrant ; xen-devel@lists.xenproject.org Cc: Paul Durrant Subject: Re: [PATCH] docs/designs: re-work the xenstore migration document... Hi, On 24/04/2020 16:59, Jürgen Groß wrote: On 24.04.20 17:44, Julien Grall wrote: If I extend the record and do a downgrade I'm losing the information, too, as the old version won't read it in any case. BTW, extending the record is no problem, as the length is stored in the header. Unknown records (and extensions) will be just ignored when reading. That's very much up to the implementation. An implementation may decide to bail out if the record is not an exact size. It won't know. The record will be whatever size it says it is, and if the format doesn't match what the implementation was expecting then it'll probably crash. In your case when reusing the paddings and doing a downgrade you would crash, as the paddings are no longer zero. In case a downgrade should not be done due to vital information loss then you should just not do it. Of course, however I don't think a user will necessarily know it should not do it. So how do you protect against misuse? The stream is versioned. If information is vital then I'd expect the version to be bumped, which should prevent a downgrade. Uh, this is problematic. I agree that a downgrade will be prevented, but merely via a crash. We won't have both versions active in parallel, but the version we are updating to will make it or not. There is no way back. The general rule should be to be as forgiving to errors in the stream as possible in order _not_ to have a crashing xenstored due to strict parameter testing. And in case there is a potential of problems it needs to be documented and then its up to the user to follow the documentation. Juergen
RE: [PATCH] docs/designs: re-work the xenstore migration document...
> -Original Message- > From: Julien Grall > Sent: 24 April 2020 17:04 > To: Jürgen Groß ; Paul Durrant ; > xen-devel@lists.xenproject.org > Cc: Paul Durrant > Subject: Re: [PATCH] docs/designs: re-work the xenstore migration document... > > Hi, > > On 24/04/2020 16:59, Jürgen Groß wrote: > > On 24.04.20 17:44, Julien Grall wrote: > > If I extend the record and do a downgrade I'm losing the information, > > too, as the old version won't read it in any case. BTW, extending the > > record is no problem, as the length is stored in the header. Unknown > > records (and extensions) will be just ignored when reading. > > That's very much up to the implementation. An implementation may decide > to bail out if the record is not an exact size. > It won't know. The record will be whatever size it says it is, and if the format doesn't match what the implementation was expecting then it'll probably crash. > > > > In your case when reusing the paddings and doing a downgrade you would > > crash, as the paddings are no longer zero. > > > > In case a downgrade should not be done due to vital information loss > > then you should just not do it. > > Of course, however I don't think a user will necessarily know it should > not do it. So how do you protect against misuse? > The stream is versioned. If information is vital then I'd expect the version to be bumped, which should prevent a downgrade. Paul
Re: [PATCH] docs/designs: re-work the xenstore migration document...
Hi, On 24/04/2020 16:59, Jürgen Groß wrote: On 24.04.20 17:44, Julien Grall wrote: If I extend the record and do a downgrade I'm losing the information, too, as the old version won't read it in any case. BTW, extending the record is no problem, as the length is stored in the header. Unknown records (and extensions) will be just ignored when reading. That's very much up to the implementation. An implementation may decide to bail out if the record is not an exact size. In your case when reusing the paddings and doing a downgrade you would crash, as the paddings are no longer zero. In case a downgrade should not be done due to vital information loss then you should just not do it. Of course, however I don't think a user will necessarily know it should not do it. So how do you protect against misuse? Cheers, -- Julien Grall
Re: [PATCH] docs/designs: re-work the xenstore migration document...
On 24.04.20 17:44, Julien Grall wrote: On 24/04/2020 15:51, Jürgen Groß wrote: On 24.04.20 16:38, Julien Grall wrote: Hi, On 24/04/2020 15:26, Jürgen Groß wrote: +``` + 0 1 2 3 4 5 6 7 octet ++---+---+---+---+---+---+---+---+ +| type | len | ++---+---+ +| body +... +| | padding (0 to 7 octets) | ++---+---+ +``` + +NOTE: padding octets here and in all subsequent format specifications must be + zero, unless stated otherwise. What about: "... are written as zero and should be ignored on read." I would rather not say "ignored" because it doesn't allow us to extend the record if needed in a safe way. The read side should abort if it sees an other value than 0 in the padding. I'd rather ignore unknown fields. This allows to add optional data without having to special case it. You will need a special case for 0 in any case. 0 will need to have the "no optional data" semantics. Writing zeroes is the important part here, of course. Ignoring those fields would e.g. enable a downgrade more easily, while aborting could make that impossible. You are assuming the fields may contain optional data. Now imagine, we realize in a few months we missed some important fields. How would you describe the required fields? I can see two ways: 1) Re-using the padding fields if possible 2) Extending the record If you re-use the padding fields, then when you downgrade you may lose information. If you extend the size of the record, then you can't downgrade. If I extend the record and do a downgrade I'm losing the information, too, as the old version won't read it in any case. BTW, extending the record is no problem, as the length is stored in the header. Unknown records (and extensions) will be just ignored when reading. In your case when reusing the paddings and doing a downgrade you would crash, as the paddings are no longer zero. In case a downgrade should not be done due to vital information loss then you should just not do it. Juergen
Re: [PATCH] docs/designs: re-work the xenstore migration document...
On 24/04/2020 15:51, Jürgen Groß wrote: On 24.04.20 16:38, Julien Grall wrote: Hi, On 24/04/2020 15:26, Jürgen Groß wrote: +``` + 0 1 2 3 4 5 6 7 octet ++---+---+---+---+---+---+---+---+ +| type | len | ++---+---+ +| body +... +| | padding (0 to 7 octets) | ++---+---+ +``` + +NOTE: padding octets here and in all subsequent format specifications must be + zero, unless stated otherwise. What about: "... are written as zero and should be ignored on read." I would rather not say "ignored" because it doesn't allow us to extend the record if needed in a safe way. The read side should abort if it sees an other value than 0 in the padding. I'd rather ignore unknown fields. This allows to add optional data without having to special case it. You will need a special case for 0 in any case. Writing zeroes is the important part here, of course. Ignoring those fields would e.g. enable a downgrade more easily, while aborting could make that impossible. You are assuming the fields may contain optional data. Now imagine, we realize in a few months we missed some important fields. How would you describe the required fields? I can see two ways: 1) Re-using the padding fields if possible 2) Extending the record If you re-use the padding fields, then when you downgrade you may lose information. If you extend the size of the record, then you can't downgrade. Cheers, -- Julien Grall
Re: [PATCH] docs/designs: re-work the xenstore migration document...
On 24.04.20 16:38, Julien Grall wrote: Hi, On 24/04/2020 15:26, Jürgen Groß wrote: +``` + 0 1 2 3 4 5 6 7 octet ++---+---+---+---+---+---+---+---+ +| type | len | ++---+---+ +| body +... +| | padding (0 to 7 octets) | ++---+---+ +``` + +NOTE: padding octets here and in all subsequent format specifications must be + zero, unless stated otherwise. What about: "... are written as zero and should be ignored on read." I would rather not say "ignored" because it doesn't allow us to extend the record if needed in a safe way. The read side should abort if it sees an other value than 0 in the padding. I'd rather ignore unknown fields. This allows to add optional data without having to special case it. Writing zeroes is the important part here, of course. Ignoring those fields would e.g. enable a downgrade more easily, while aborting could make that impossible. And in case a version would write non-zero bytes (e.g. due to a bug) this version could never be live-updated. Juergen
Re: [PATCH] docs/designs: re-work the xenstore migration document...
Hi, On 24/04/2020 15:26, Jürgen Groß wrote: +``` + 0 1 2 3 4 5 6 7 octet ++---+---+---+---+---+---+---+---+ +| type | len | ++---+---+ +| body +... +| | padding (0 to 7 octets) | ++---+---+ +``` + +NOTE: padding octets here and in all subsequent format specifications must be + zero, unless stated otherwise. What about: "... are written as zero and should be ignored on read." I would rather not say "ignored" because it doesn't allow us to extend the record if needed in a safe way. The read side should abort if it sees an other value than 0 in the padding. Cheers, -- Julien Grall
Re: [PATCH] docs/designs: re-work the xenstore migration document...
On 24.04.20 15:37, Paul Durrant wrote: From: Paul Durrant ... to specify a separate migration stream that will also be suitable for live update. The original scope of the document was to support non-cooperative migration of guests [1] but, since then, live update of xenstored has been brought into scope. Thus it makes more sense to define a separate image format for serializing xenstore state that is suitable for both purposes. The document has been limited to specifying a new image format. The mechanism for acquiring the image for live update or migration is not covered as that is more appropriately dealt with by a patch to docs/misc/xenstore.txt. It is also expected that, when the first implementation of live update or migration making use of this specification is committed, that the document is moved from docs/designs into docs/specs. [1] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/designs/non-cooperative-migration.md Signed-off-by: Paul Durrant --- Juergen Gross Andrew Cooper George Dunlap Ian Jackson Jan Beulich Julien Grall Stefano Stabellini Wei Liu --- docs/designs/xenstore-migration.md | 472 +++-- 1 file changed, 309 insertions(+), 163 deletions(-) diff --git a/docs/designs/xenstore-migration.md b/docs/designs/xenstore-migration.md index 6ab351e8fe..c96bad48eb 100644 --- a/docs/designs/xenstore-migration.md +++ b/docs/designs/xenstore-migration.md @@ -3,254 +3,400 @@ ## Background The design for *Non-Cooperative Migration of Guests*[1] explains that extra -save records are required in the migrations stream to allow a guest running -PV drivers to be migrated without its co-operation. Moreover the save -records must include details of registered xenstore watches as well as -content; information that cannot currently be recovered from `xenstored`, -and hence some extension to the xenstore protocol[2] will also be required. - -The *libxenlight Domain Image Format* specification[3] already defines a -record type `EMULATOR_XENSTORE_DATA` but this is not suitable for -transferring xenstore data pertaining to the domain directly as it is -specified such that keys are relative to the path -`/local/domain/$dm_domid/device-model/$domid`. Thus it is necessary to -define at least one new save record type. +save records are required in the migrations stream to allow a guest running PV +drivers to be migrated without its co-operation. Moreover the save records must +include details of registered xenstore watches as well ascontent; information +that cannot currently be recovered from `xenstored`, and hence some extension +to the xenstored implementations will also be required. + +As a similar set of data is needed for transferring xenstore data from one +instance to another when live updating xenstored this document proposes an +image format for a 'migration stream' suitable for both purposes. ## Proposal -### New Save Record +The image format consists of a _header_ followed by 1 or more _records_. Each +record consists of a type and length field, followed by any data mandated by +the record type. At minimum there will be a single record of type `END` +(defined below). -A new mandatory record type should be defined within the libxenlight Domain -Image Format: +### Header -`0x0007: DOMAIN_XENSTORE_DATA` +The header identifies the stream as a `xenstore` stream, including the version +of the specification that it complies with. -An arbitrary number of these records may be present in the migration -stream and may appear in any order. The format of each record should be as -follows: +All fields in this header must be in _big-endian_ byte order, regardless of +the setting of the endianness bit. ``` 0 1 2 3 4 5 6 7octet +---+---+---+---+---+---+---+---+ -| type | record specific data | -+---+ | -... -+---+ +| ident | ++---+---| +| version | flags | ++---+---+ ``` -where type is one of the following values +| Field | Description | +|---|---| +| `ident` | 0x78656e73746f7265 ('xenstore' in ASCII) | +| | | +| `version` | 0x0001 (the version of the specification) | +| | | +| `flags` | 0 (LSB): Endianness: 0 = little, 1 = big | +| | | +| | 1-31: Reserved (mu